CS109 A - Fall 2018: Project Group 14

CS109A Introduction to Data Science:

Lending Club Project

Harvard University
Fall 2018


In [1]:
#RUN THIS CELL 
import requests
from IPython.core.display import HTML
styles = requests.get("https://raw.githubusercontent.com/Harvard-IACS/2018-CS109A/master/content/styles/cs109.css").text
HTML(styles)
Out[1]:
In [2]:
import numpy as np
import pandas as pd
import datetime
import warnings
warnings.filterwarnings('ignore')

import statsmodels.api as sm
from statsmodels.api import OLS

from sklearn.decomposition import PCA
from sklearn.linear_model import LogisticRegression
from sklearn.linear_model import LogisticRegressionCV
from sklearn.linear_model import LassoCV
from sklearn.discriminant_analysis import LinearDiscriminantAnalysis
from sklearn.discriminant_analysis import QuadraticDiscriminantAnalysis
from sklearn.preprocessing import PolynomialFeatures
from sklearn.neighbors import KNeighborsClassifier
from sklearn.model_selection import cross_val_score
from sklearn.model_selection import train_test_split
from sklearn.metrics import accuracy_score
from sklearn.metrics import precision_score
from sklearn.metrics import confusion_matrix
from sklearn.model_selection import KFold
from sklearn.preprocessing import MinMaxScaler
from sklearn.utils import resample

# Plotly visualizations
from plotly import tools
import plotly.plotly as py
import plotly.figure_factory as ff
import plotly.graph_objs as go
from plotly.offline import download_plotlyjs, init_notebook_mode, plot, iplot
import tensorflow as tf

from sklearn.tree import DecisionTreeClassifier
from sklearn.tree import export_graphviz
from sklearn.pipeline import make_pipeline
from sklearn.datasets import make_blobs
import sklearn.metrics as metrics
from sklearn.model_selection import cross_val_score
from sklearn.metrics import accuracy_score
from sklearn import tree
from sklearn.tree import DecisionTreeClassifier
from sklearn.ensemble import RandomForestClassifier
from sklearn.ensemble import AdaBoostClassifier
from sklearn.linear_model import LogisticRegressionCV
from sklearn.model_selection import KFold
from sklearn.metrics import confusion_matrix

from sklearn.preprocessing import StandardScaler
import time

import math
from scipy.special import gamma

import matplotlib
import matplotlib.pyplot as plt
import matplotlib.patches as mpatches
%matplotlib inline

import seaborn as sns
sns.set()
import matplotlib.style
matplotlib.style.use('seaborn-whitegrid')
sns.set_style("white")
from IPython.display import display
init_notebook_mode(connected=True)

Overview

This notebook contains the following sections:

  • Part 1: Preparing the data
  • Part 2: Resample to achieve balanced classes
  • Part 3: Scale the data and generate train/test splits
  • Part 4 Determine significant predictors
  • Part 5: Exploring the data
Part 1: Preparing the data

This notebook uses the cleaned CSV data file data_cleaned_2016_2017.csv downloaded from https://drive.google.com/open?id=1LCk-dDFC7O_6ek1i0IIGqE07Rq-kf1Xz.

The cleaned dataset still needs some pre-processing in order to make it ready for modelling. This includes:

  • Removing several more columns that are not informative, for example where they duplicate other information or only have a single value
  • Drop the last digits of the zip code
  • Recoding some ordinal variables into numerical scales
  • Recoding some categorical variables into dummy variables
In [3]:
# increase some display options to display all columns and more rows.
pd.set_option('display.max_columns', None)
pd.options.display.max_rows = 150
In [4]:
# read in the 2016-2017 data set
original_df = pd.read_csv('data/data_test.csv', low_memory = False)
In [5]:
display(original_df.shape)
original_df.head()
(334109, 88)
Out[5]:
loan_amnt funded_amnt funded_amnt_inv term int_rate installment grade sub_grade emp_length home_ownership annual_inc verification_status issue_d loan_status pymnt_plan purpose zip_code addr_state dti delinq_2yrs earliest_cr_line fico_range_low fico_range_high inq_last_6mths mths_since_last_delinq mths_since_last_record open_acc pub_rec revol_bal revol_util total_acc initial_list_status mths_since_last_major_derog application_type annual_inc_joint dti_joint acc_now_delinq open_acc_6m open_act_il open_il_12m open_il_24m mths_since_rcnt_il total_bal_il il_util open_rv_12m open_rv_24m max_bal_bc all_util inq_fi total_cu_tl inq_last_12m acc_open_past_24mths avg_cur_bal bc_open_to_buy bc_util chargeoff_within_12_mths delinq_amnt mo_sin_old_il_acct mo_sin_old_rev_tl_op mo_sin_rcnt_rev_tl_op mo_sin_rcnt_tl mort_acc mths_since_recent_bc mths_since_recent_bc_dlq mths_since_recent_inq mths_since_recent_revol_delinq num_accts_ever_120_pd num_actv_bc_tl num_actv_rev_tl num_bc_sats num_bc_tl num_il_tl num_op_rev_tl num_rev_accts num_rev_tl_bal_gt_0 num_sats num_tl_120dpd_2m num_tl_30dpd num_tl_90g_dpd_24m num_tl_op_past_12m pct_tl_nvr_dlq percent_bc_gt_75 pub_rec_bankruptcies tax_liens revol_bal_joint sec_app_mort_acc sec_app_revol_util sec_app_mths_since_last_major_derog
0 14000.0 14000.0 14000.0 1 15.99 340.38 C C5 10.0 RENT 43000.0 Source Verified 2017 Charged Off 0 debt_consolidation 367xx AL 21.80 1.0 1995 670.0 674.0 0.0 1 0 3.0 0.0 18537.0 99.1 8.0 1 0 0 0.0 0.0 0.0 0.0 1.0 0.0 0.0 1 8035.0 42.0 0.0 0.0 18537.0 70.0 0.0 0.0 0.0 0.0 8857.0 163.0 99.1 0.0 0.0 1 1 1 1 0.0 1 0 0 0 0.0 1.0 1.0 1.0 2.0 5.0 1.0 3.0 1.0 3.0 0.0 0.0 0.0 0.0 87.5 100.0 0.0 0.0 0.0 0.0 0.0 0
1 5000.0 5000.0 5000.0 0 14.99 173.31 C C4 10.0 RENT 68000.0 Not Verified 2017 Fully Paid 0 debt_consolidation 945xx CA 22.50 0.0 2003 660.0 664.0 0.0 1 0 6.0 0.0 10276.0 90.1 18.0 0 0 0 0.0 0.0 0.0 0.0 2.0 0.0 2.0 1 25892.0 64.0 0.0 0.0 4261.0 69.0 1.0 1.0 0.0 2.0 6028.0 1124.0 90.1 0.0 0.0 1 1 1 1 0.0 1 0 1 0 0.0 4.0 4.0 4.0 6.0 8.0 4.0 9.0 4.0 6.0 0.0 0.0 0.0 0.0 94.4 75.0 0.0 0.0 0.0 0.0 0.0 0
2 10150.0 10150.0 10150.0 0 7.24 314.52 A A3 8.0 MORTGAGE 50000.0 Not Verified 2017 Fully Paid 0 debt_consolidation 773xx TX 29.60 0.0 2002 740.0 744.0 1.0 0 0 9.0 0.0 21845.0 56.0 21.0 1 0 0 0.0 0.0 0.0 1.0 3.0 1.0 1.0 1 23502.0 43.0 0.0 0.0 11270.0 49.0 1.0 1.0 2.0 2.0 29908.0 13951.0 58.2 0.0 0.0 1 1 1 1 3.0 1 0 1 0 0.0 3.0 4.0 3.0 5.0 8.0 5.0 10.0 4.0 9.0 0.0 0.0 0.0 2.0 100.0 33.3 0.0 0.0 0.0 0.0 0.0 0
3 8400.0 8400.0 8400.0 0 11.39 276.56 B B3 8.0 MORTGAGE 50000.0 Source Verified 2017 Charged Off 0 other 454xx OH 15.63 0.0 2005 675.0 679.0 0.0 0 0 14.0 0.0 12831.0 30.3 30.0 1 0 0 0.0 0.0 0.0 3.0 2.0 1.0 2.0 1 38760.0 105.0 4.0 8.0 5338.0 65.0 4.0 1.0 7.0 10.0 12389.0 24145.0 33.1 0.0 0.0 1 1 1 1 4.0 1 0 1 0 0.0 4.0 5.0 7.0 11.0 9.0 11.0 16.0 5.0 14.0 0.0 0.0 0.0 5.0 100.0 14.3 0.0 0.0 0.0 0.0 0.0 0
4 10000.0 10000.0 10000.0 0 12.74 335.69 C C1 10.0 OWN 40000.0 Not Verified 2017 Fully Paid 0 debt_consolidation 324xx FL 8.85 0.0 1997 700.0 704.0 0.0 0 0 7.0 0.0 9227.0 55.9 15.0 0 0 0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 1 0.0 0.0 1.0 3.0 5454.0 56.0 0.0 0.0 1.0 3.0 1318.0 1691.0 79.4 0.0 0.0 1 1 1 1 2.0 1 0 1 0 0.0 2.0 4.0 2.0 3.0 2.0 7.0 11.0 4.0 7.0 0.0 0.0 0.0 1.0 100.0 50.0 0.0 0.0 0.0 0.0 0.0 0

We now use the pd.DataFrame.corr() function to find correlations between variables. A value of 1.0 means the variables are perfectly correlated and a value of 0 means they're not correlated at all. We decided on a threshold value of $> 0.9$.

In [6]:
correlations = original_df.corr()
threshold = 0.9

# display entire matrix and color in red if predictors > threshold
correlations.style.apply(lambda x: ["background: red" if v > threshold else "" for v in x], axis = 1)
Out[6]:
loan_amnt funded_amnt funded_amnt_inv term int_rate installment emp_length annual_inc issue_d pymnt_plan dti delinq_2yrs earliest_cr_line fico_range_low fico_range_high inq_last_6mths mths_since_last_delinq mths_since_last_record open_acc pub_rec revol_bal revol_util total_acc initial_list_status mths_since_last_major_derog application_type annual_inc_joint dti_joint acc_now_delinq open_acc_6m open_act_il open_il_12m open_il_24m mths_since_rcnt_il total_bal_il il_util open_rv_12m open_rv_24m max_bal_bc all_util inq_fi total_cu_tl inq_last_12m acc_open_past_24mths avg_cur_bal bc_open_to_buy bc_util chargeoff_within_12_mths delinq_amnt mo_sin_old_il_acct mo_sin_old_rev_tl_op mo_sin_rcnt_rev_tl_op mo_sin_rcnt_tl mort_acc mths_since_recent_bc mths_since_recent_bc_dlq mths_since_recent_inq mths_since_recent_revol_delinq num_accts_ever_120_pd num_actv_bc_tl num_actv_rev_tl num_bc_sats num_bc_tl num_il_tl num_op_rev_tl num_rev_accts num_rev_tl_bal_gt_0 num_sats num_tl_120dpd_2m num_tl_30dpd num_tl_90g_dpd_24m num_tl_op_past_12m pct_tl_nvr_dlq percent_bc_gt_75 pub_rec_bankruptcies tax_liens revol_bal_joint sec_app_mort_acc sec_app_revol_util sec_app_mths_since_last_major_derog
loan_amnt 1 1 0.999993 0.371943 0.171388 0.955203 0.100803 0.290591 -0.0223079 nan 0.0354099 -0.00144835 -0.138749 0.0734488 0.0734488 -0.00949889 -0.00402864 -0.067105 0.17634 -0.0470971 0.300435 0.119665 0.190422 0.0639881 -0.0380772 0.10254 0.128165 0.0989231 0.0017271 -0.0153681 0.0304857 0.00200697 0.0315075 0.0499328 0.157087 -0.0383439 -0.034697 -0.017261 0.353818 0.0146875 0.00764786 0.0695528 0.0164975 0.019626 0.215998 0.188939 0.0761079 -0.00195514 0.00276253 nan nan nan nan 0.212504 0.0462264 -0.0068254 0.0138877 -0.00389924 -0.043254 0.199607 0.161251 0.220138 0.199425 0.0729088 0.160412 0.163052 0.157129 0.176213 0.0013363 0.00312459 -0.0185283 -0.012524 0.0702641 0.04447 -0.0722752 0.0132666 0.0917166 0.0654644 0.0651616 0.025622
funded_amnt 1 1 0.999993 0.371943 0.171388 0.955203 0.100803 0.290591 -0.0223079 nan 0.0354099 -0.00144835 -0.138749 0.0734488 0.0734488 -0.00949889 -0.00402864 -0.067105 0.17634 -0.0470971 0.300435 0.119665 0.190422 0.0639881 -0.0380772 0.10254 0.128165 0.0989231 0.0017271 -0.0153681 0.0304857 0.00200697 0.0315075 0.0499328 0.157087 -0.0383439 -0.034697 -0.017261 0.353818 0.0146875 0.00764786 0.0695528 0.0164975 0.019626 0.215998 0.188939 0.0761079 -0.00195514 0.00276253 nan nan nan nan 0.212504 0.0462264 -0.0068254 0.0138877 -0.00389924 -0.043254 0.199607 0.161251 0.220138 0.199425 0.0729088 0.160412 0.163052 0.157129 0.176213 0.0013363 0.00312459 -0.0185283 -0.012524 0.0702641 0.04447 -0.0722752 0.0132666 0.0917166 0.0654644 0.0651616 0.025622
funded_amnt_inv 0.999993 0.999993 1 0.372176 0.171397 0.955109 0.100855 0.290609 -0.0221908 nan 0.0353379 -0.00149396 -0.138754 0.0735679 0.0735679 -0.0095706 -0.00407488 -0.0671567 0.176311 -0.0471432 0.300432 0.119642 0.190395 0.0647964 -0.0381205 0.102532 0.12816 0.0989095 0.00170044 -0.0154125 0.0304666 0.00198669 0.0314859 0.0499276 0.157077 -0.0383544 -0.0347373 -0.0173085 0.353833 0.0146675 0.00763789 0.0695417 0.0164552 0.0195828 0.216055 0.188967 0.0760853 -0.00196807 0.00274976 nan nan nan nan 0.212543 0.0462363 -0.00685981 0.0138697 -0.00393152 -0.0432889 0.199572 0.161207 0.220116 0.199397 0.0728883 0.160384 0.163024 0.157086 0.176186 0.00131546 0.00310217 -0.018561 -0.0125626 0.0703008 0.0444573 -0.0723112 0.0132395 0.0917409 0.0654809 0.065188 0.0256408
term 0.371943 0.371943 0.372176 1 0.394674 0.14862 0.0600385 0.0547661 0.0123301 nan 0.0483536 -0.0106056 -0.0360779 0.00229987 0.00230004 0.00795103 -0.0100385 -0.0059825 0.0711056 -0.0108748 0.0722922 0.0578313 0.0880162 0.176625 -0.0140784 0.0642611 0.0643084 0.0637183 -0.00242646 0.0139953 0.0343187 0.0419519 0.0627368 0.0352736 0.0769668 0.042335 -0.0146462 -0.00613441 0.0903663 0.050348 0.027585 0.0538909 0.0303021 0.0368651 0.0757197 0.0209426 0.0446585 -0.00225203 0.000162738 nan nan nan nan 0.0901313 0.0152259 -0.00936181 0.0248917 -0.0117617 -0.0171136 0.0527023 0.0559652 0.0602925 0.0500086 0.0653109 0.0507575 0.0501363 0.0528589 0.0707827 -0.00132722 -0.00163741 -0.0113272 0.0181777 0.0383303 0.0400646 -0.0025567 -0.00782063 0.0546446 0.0462316 0.0512744 0.0269252
int_rate 0.171388 0.171388 0.171397 0.394674 1 0.214653 -0.0163869 -0.0710463 0.059859 nan 0.158991 0.0322546 0.119122 -0.358621 -0.358618 0.199015 0.0460827 0.0646569 0.0110733 0.0548128 -0.0107056 0.212328 -0.0444001 -0.162455 0.0596923 0.0411864 0.0263215 0.0560887 0.0065543 0.170999 0.0333187 0.19403 0.171914 -0.0102362 0.0376085 0.125128 0.150425 0.158809 -0.0424092 0.25345 0.145232 0.0204437 0.1967 0.208535 -0.0883193 -0.244328 0.210012 0.00647974 0.00431558 nan nan nan nan -0.111063 -0.0289126 0.0239265 0.095931 0.0266176 0.0362125 0.0534045 0.0947076 -0.0243449 -0.0745048 0.0154559 0.00648551 -0.0537557 0.0955494 0.0108543 0.00235276 0.00567208 0.0198795 0.214301 -0.0472754 0.205924 0.0569179 0.0151899 0.0305633 0.00964099 0.0466987 0.0322549
installment 0.955203 0.955203 0.955109 0.14862 0.214653 1 0.0877734 0.272842 -0.0204832 nan 0.0473371 0.0060309 -0.115743 0.0210835 0.0210838 0.0163271 0.00647653 -0.0553965 0.167188 -0.0355255 0.28601 0.139345 0.16772 -0.00837457 -0.0264476 0.0913125 0.113794 0.0907054 0.00338448 0.00464014 0.0273263 0.0181085 0.0411935 0.0420945 0.146332 -0.0330059 -0.00946319 0.00898484 0.331862 0.0367535 0.0209666 0.0597467 0.037195 0.0424672 0.184681 0.150411 0.098197 -0.000521834 0.00357904 nan nan nan nan 0.175254 0.0421676 -0.000387367 0.0207774 0.00415115 -0.0346932 0.20317 0.168627 0.209793 0.184321 0.0598414 0.156451 0.149827 0.165577 0.16713 0.00192625 0.00431939 -0.0139407 0.0132499 0.0553822 0.0651471 -0.0633205 0.018998 0.0794495 0.0522894 0.0577813 0.0232078
emp_length 0.100803 0.100803 0.100855 0.0600385 -0.0163869 0.0877734 1 0.0991428 -0.0165807 nan -0.0122115 0.0192935 -0.115603 0.00891098 0.00891115 0.0031631 0.0456234 0.0160075 0.0668743 0.0123518 0.0870979 0.0509222 0.103768 0.0200416 0.0245972 -0.0747383 -0.0584512 -0.0694189 0.0108487 0.0226559 -0.0626386 0.0474697 0.0610861 0.0523735 -0.00350992 -0.019046 0.00905316 0.00405258 0.0713191 -0.0147168 0.0092254 0.0834455 0.000861288 0.038457 0.0955353 0.0212737 0.0420823 0.00673721 0.00203488 nan nan nan nan 0.158505 0.00463817 0.0261163 0.00435118 0.030035 0.00493574 0.0758387 0.111782 0.0688841 0.0899515 0.00275573 0.0982971 0.11622 0.112049 0.0659711 0.00146513 0.0105402 -0.00294973 0.0352015 -0.0166497 0.0374231 0.00503536 0.00531877 -0.0640999 -0.0570702 -0.0825761 -0.0539475
annual_inc 0.290591 0.290591 0.290609 0.0547661 -0.0710463 0.272842 0.0991428 1 0.00654048 nan -0.126809 0.0282487 -0.119002 0.058931 0.0589308 0.0292207 0.0405467 -0.0244687 0.126109 -0.00382472 0.272479 0.0446215 0.15851 0.0346802 0.0116384 -0.0536792 -0.0215331 -0.0542628 0.0128571 0.0373415 0.0633142 0.0797827 0.102467 0.0499242 0.201533 0.00228085 -0.00400232 -0.0103785 0.24945 0.0067374 0.053162 0.0419613 0.064537 0.0610748 0.287459 0.153362 0.0121136 0.00481134 0.00818791 nan nan nan nan 0.198136 0.0176033 0.0295602 0.0316428 0.0280972 0.00959516 0.106956 0.0792329 0.123213 0.128081 0.0929922 0.0782448 0.0984909 0.077653 0.125925 0.00766741 0.0105974 0.00193351 0.0520409 0.000633011 0.000613847 -0.0367303 0.0350064 -0.0211773 -0.0262688 -0.0391232 -0.0272948
issue_d -0.0223079 -0.0223079 -0.0221908 0.0123301 0.059859 -0.0204832 -0.0165807 0.00654048 1 nan -0.00674063 -0.0144732 0.0647172 0.0604769 0.0604766 -0.00617351 -0.0207553 -0.0081671 -0.0167096 -0.0157999 -0.0115209 -0.066062 -0.0251697 -0.0257494 -0.0190358 0.131558 0.123699 0.12324 -0.00929717 -0.0237732 -0.0104031 -0.0312628 -0.00431884 -0.00258569 0.0028623 -0.0295579 -0.0136302 -0.0139293 -0.0016616 -0.0746282 0.0260245 0.00432069 -0.00388903 -0.00624408 0.0272625 0.0629564 -0.0728878 0.00153744 -0.00383244 nan nan nan nan -0.0154189 -0.00796961 -0.0223056 0.000616289 -0.0197745 -0.00689764 -0.0152928 -0.032089 0.00675165 -0.0233043 -0.00598991 -0.00992501 -0.0268819 -0.0363046 -0.0163005 -0.00344101 -0.00769706 -0.00695552 -0.0211688 0.00721334 -0.0597402 0.0029982 -0.0132262 0.153795 0.135667 0.181264 0.120069
pymnt_plan nan nan nan nan nan nan nan nan nan nan nan nan nan nan nan nan nan nan nan nan nan nan nan nan nan nan nan nan nan nan nan nan nan nan nan nan nan nan nan nan nan nan nan nan nan nan nan nan nan nan nan nan nan nan nan nan nan nan nan nan nan nan nan nan nan nan nan nan nan nan nan nan nan nan nan nan nan nan nan nan
dti 0.0354099 0.0354099 0.0353379 0.0483536 0.158991 0.0473371 -0.0122115 -0.126809 -0.00674063 nan 1 -0.0104998 -0.0338418 -0.0540789 -0.0540823 0.00111515 -0.0170658 -0.0245811 0.195946 -0.0305627 0.103301 0.148754 0.160071 -0.0259364 -0.0273196 0.189708 0.167284 0.231641 -0.00105407 0.0302801 0.173504 0.131945 0.180339 0.11519 0.192938 0.165749 -0.000557731 0.0239507 0.0851506 0.170675 0.055609 0.09578 0.0308119 0.101628 -0.0724721 -0.063867 0.150757 -0.00259618 -0.00724449 nan nan nan nan -0.0128473 0.00830909 -0.0249932 0.0291981 -0.0196061 -0.0274789 0.126229 0.181384 0.0754692 0.0552756 0.157749 0.125151 0.097101 0.184929 0.195886 -0.00494301 0.001442 -0.0118116 0.0566104 0.0640695 0.133691 -0.0127161 -0.0257022 0.157884 0.12818 0.156272 0.0887798
delinq_2yrs -0.00144835 -0.00144835 -0.00149396 -0.0106056 0.0322546 0.0060309 0.0192935 0.0282487 -0.0144732 nan -0.0104998 1 -0.0749717 -0.171801 -0.171798 0.0269384 0.344679 -0.0506505 0.0522946 -0.0366986 -0.0226474 -0.00458231 0.112875 -0.011731 0.226492 -0.00262258 0.00268839 -0.00373691 0.123617 -0.000239808 0.0640688 -0.00885671 -0.0258594 0.0277418 0.0485201 0.0141296 -0.0234147 -0.0489541 -0.0515156 0.0213504 0.0229035 0.0172532 0.0264162 -0.0585419 0.0317852 -0.061073 -0.00470567 0.152355 0.035054 nan nan nan nan 0.0571907 -0.0158934 0.317727 0.0128847 0.352637 0.216559 -0.0328287 0.00268074 -0.026932 0.0308758 0.082474 0.0110338 0.0751926 -0.00182585 0.049985 0.0472085 0.103201 0.665355 -0.0285821 -0.440008 -0.00673088 -0.064268 0.00593368 -0.00670418 -0.00149503 -0.00247188 0.0117696
earliest_cr_line -0.138749 -0.138749 -0.138754 -0.0360779 0.119122 -0.115743 -0.115603 -0.119002 0.0647172 nan -0.0338418 -0.0749717 1 -0.090225 -0.0902306 -0.00866513 -0.132666 -0.0682431 -0.137432 -0.059932 -0.200211 -0.0336224 -0.265802 -0.0406771 -0.112574 0.0119714 -0.00187587 0.00987256 -0.0314438 -0.0118584 0.0583278 -0.00247454 0.00875091 -0.036872 -0.0285505 0.0601059 0.00055375 0.024354 -0.190607 0.0536393 0.0292811 -0.0403152 0.0223407 0.021144 -0.118321 -0.140977 -0.0234079 -0.0266618 -0.00867312 nan nan nan nan -0.27796 -0.00855518 -0.120467 0.019383 -0.132552 -0.0854754 -0.116417 -0.15247 -0.13836 -0.263238 -0.0181363 -0.169198 -0.307887 -0.149252 -0.133176 -0.0106247 -0.0273597 -0.0329759 -0.00379716 0.0884751 -0.0353742 -0.0560401 -0.0281559 -0.00784146 -0.0099418 0.0119592 0.0037996
fico_range_low 0.0734488 0.0734488 0.0735679 0.00229987 -0.358621 0.0210835 0.00891098 0.058931 0.0604769 nan -0.0540789 -0.171801 -0.090225 1 1 -0.097505 -0.307014 -0.243953 0.0196743 -0.20027 -0.000530307 -0.465726 0.0252076 0.0770303 -0.282168 0.0178947 0.0257033 0.010836 -0.0428563 -0.0616149 -0.0134152 -0.0216399 -0.017437 -0.00255944 0.0117294 -0.0818219 -0.109483 -0.133455 0.0460722 -0.416169 -0.0815677 -0.00267085 -0.126787 -0.10921 0.10444 0.507928 -0.470744 -0.0528395 -0.0151574 nan nan nan nan 0.104666 0.00784537 -0.202268 -0.094217 -0.236348 -0.194331 -0.120806 -0.19519 0.0642446 0.0737801 -0.0171835 0.0134573 0.0295203 -0.195455 0.0240883 -0.0180953 -0.0321727 -0.101024 -0.0953209 0.296871 -0.399495 -0.206976 -0.0617305 0.0357951 0.0349613 0.0193274 0.00648239
fico_range_high 0.0734488 0.0734488 0.0735679 0.00230004 -0.358618 0.0210838 0.00891115 0.0589308 0.0604766 nan -0.0540823 -0.171798 -0.0902306 1 1 -0.0975053 -0.307012 -0.243949 0.0196713 -0.200267 -0.000532927 -0.465723 0.0252047 0.0770285 -0.282164 0.0178956 0.025704 0.0108358 -0.0428545 -0.0616173 -0.0134182 -0.0216427 -0.0174411 -0.00256083 0.0117265 -0.0818255 -0.109485 -0.133456 0.0460666 -0.41617 -0.0815686 -0.00267086 -0.126788 -0.109214 0.104442 0.507929 -0.470741 -0.052837 -0.0151572 nan nan nan nan 0.104667 0.00784578 -0.202266 -0.0942215 -0.236346 -0.194328 -0.120807 -0.19519 0.0642422 0.0737779 -0.0171868 0.0134556 0.0295188 -0.195455 0.0240852 -0.018095 -0.0321722 -0.101022 -0.0953241 0.296867 -0.399491 -0.206973 -0.0617298 0.0357942 0.0349624 0.0193286 0.00648387
inq_last_6mths -0.00949889 -0.00949889 -0.0095706 0.00795103 0.199015 0.0163271 0.0031631 0.0292207 -0.00617351 nan 0.00111515 0.0269384 -0.00866513 -0.097505 -0.0975053 1 0.0402042 0.0876163 0.167398 0.0750018 -0.00286105 -0.102894 0.160403 -0.0554694 0.0821566 -0.0223655 -0.0208437 -0.0196721 -0.00414494 0.424702 0.0306383 0.145311 0.117973 0.0198792 0.0504319 0.0717362 0.341016 0.298656 -0.0542851 -0.048355 0.218088 0.0313672 0.484791 0.303038 -0.0440917 0.0318196 -0.0897593 0.0102862 -0.00224899 nan nan nan nan 0.0126668 0.0173994 0.0349001 0.209879 0.0338875 0.0462168 0.101559 0.156783 0.15623 0.141211 0.0668369 0.175681 0.172687 0.129672 0.163325 -0.00281417 -0.00666443 0.0277039 0.353596 -0.022165 -0.0854127 0.0855617 0.0181919 -0.0193547 -0.0154953 -0.021927 -0.00925314
mths_since_last_delinq -0.00402864 -0.00402864 -0.00407488 -0.0100385 0.0460827 0.00647653 0.0456234 0.0405467 -0.0207553 nan -0.0170658 0.344679 -0.132666 -0.307014 -0.307012 0.0402042 1 -0.0467039 0.0563469 -0.0189539 -0.0471993 0.0101965 0.164877 -0.00998736 0.570468 -0.00666193 -0.000829696 -0.0068887 0.0689108 0.0491603 0.0527387 0.0388924 0.0373714 0.0681927 0.0643094 0.0429723 0.0417107 0.0353053 -0.0817066 0.053505 0.073993 0.0386346 0.074587 0.047586 0.0490007 -0.111267 0.0182409 0.0702054 0.0207741 nan nan nan nan 0.107099 -0.0147326 0.550657 0.0537591 0.707576 0.362446 -0.0361648 0.0130495 -0.0490637 0.0499351 0.120277 0.0219983 0.103071 0.0165125 0.0554331 0.0287721 0.0562023 0.152675 0.0539345 -0.608835 0.0043547 -0.0888289 0.0265877 -0.0151579 -0.00237543 -0.00673003 0.0257454
mths_since_last_record -0.067105 -0.067105 -0.0671567 -0.0059825 0.0646569 -0.0553965 0.0160075 -0.0244687 -0.0081671 nan -0.0245811 -0.0506505 -0.0682431 -0.243953 -0.243949 0.0876163 -0.0467039 1 -0.0119334 0.796918 -0.11151 -0.0671207 0.010897 -0.0179055 -0.000451319 -0.000572396 -0.00698251 -0.00275022 -0.0054401 0.0661928 -0.0315764 0.0486684 0.0540555 0.0218196 -0.0264372 0.0319805 0.093185 0.122713 -0.144157 -0.0136064 0.0762482 0.00847277 0.105516 0.123906 -0.0736874 -0.113859 -0.0168851 -0.011433 0.000426055 nan nan nan nan -0.0264072 0.00549048 -0.05204 0.0695888 -0.0663142 -0.0108138 -0.0466023 0.00624367 -0.0602441 -0.0142455 -0.0135545 0.0195522 0.035464 0.00290667 -0.0173759 0.00103694 -0.00998001 -0.0169895 0.0987125 0.0368363 -0.0300964 0.776127 0.299501 -0.0158114 -0.00506234 -0.00774973 -0.0118531
open_acc 0.17634 0.17634 0.176311 0.0711056 0.0110733 0.167188 0.0668743 0.126109 -0.0167096 nan 0.195946 0.0522946 -0.137432 0.0196743 0.0196713 0.167398 0.0563469 -0.0119334 1 -0.015478 0.21703 -0.135009 0.711668 0.0152771 0.0159066 -0.0184166 -0.000557158 -0.000784209 0.0224183 0.281506 0.51498 0.167873 0.237871 0.108818 0.342034 0.21282 0.369619 0.473997 0.106785 -0.0123935 0.108395 0.101001 0.184489 0.51123 -0.124814 0.305943 -0.0741913 0.005978 0.00331224 nan nan nan nan 0.128376 0.0907996 0.0272822 0.128645 0.0595541 0.0314553 0.551326 0.661589 0.6326 0.5457 0.38191 0.840337 0.671288 0.663984 0.998662 0.00466632 0.0196159 0.0159108 0.391381 0.100041 -0.0750634 -0.0134098 -0.0073593 0.0107744 -0.0037312 -0.0234721 -0.0122987
pub_rec -0.0470971 -0.0470971 -0.0471432 -0.0108748 0.0548128 -0.0355255 0.0123518 -0.00382472 -0.0157999 nan -0.0305627 -0.0366986 -0.059932 -0.20027 -0.200267 0.0750018 -0.0189539 0.796918 -0.015478 1 -0.0872123 -0.0517418 -0.00591755 -0.0174667 0.0148058 -0.00585126 -0.00959029 -0.00793729 -0.00257833 0.0568102 -0.028641 0.0374344 0.0381032 0.0166762 -0.0223009 0.0232084 0.0807145 0.103701 -0.114796 -0.00883367 0.0674735 -0.00157323 0.0904722 0.0998176 -0.0579094 -0.0907413 -0.0153993 -0.00895587 0.00281559 nan nan nan nan -0.0255759 0.005651 -0.0322887 0.0573158 -0.0431191 0.00321878 -0.029659 0.00564454 -0.044667 -0.0180016 -0.0178215 0.0107301 0.0135757 0.00396216 -0.019072 0.00475399 -0.00658831 -0.0117064 0.0823809 0.0121686 -0.0295045 0.654163 0.686977 -0.0161148 -0.0083866 -0.0103497 -0.0104793
revol_bal 0.300435 0.300435 0.300432 0.0722922 -0.0107056 0.28601 0.0870979 0.272479 -0.0115209 nan 0.103301 -0.0226474 -0.200211 -0.000530307 -0.000532927 -0.00286105 -0.0471993 -0.11151 0.21703 -0.0872123 1 0.262357 0.185866 0.0219457 -0.0746145 -0.00300515 0.0172646 0.00648542 0.00352412 -0.0193387 -0.0012892 -0.0352578 -0.0244266 0.0215199 0.0925043 -0.0592588 -0.00142028 0.00277935 0.546885 0.109251 -0.0468321 0.0357439 -0.0253142 -0.00365459 0.281683 0.150496 0.191461 -0.00967216 0.00223265 nan nan nan nan 0.216294 0.0414679 -0.0326004 -0.00693511 -0.0274949 -0.070373 0.309452 0.309683 0.279112 0.232235 0.0125949 0.226513 0.215664 0.308051 0.215405 -0.00285345 0.0069155 -0.028822 -0.0157294 0.0965034 0.157006 -0.107666 -0.010275 0.0350645 0.00766717 0.00317158 -0.0122944
revol_util 0.119665 0.119665 0.119642 0.0578313 0.212328 0.139345 0.0509222 0.0446215 -0.066062 nan 0.148754 -0.00458231 -0.0336224 -0.465726 -0.465723 -0.102894 0.0101965 -0.0671207 -0.135009 -0.0517418 0.262357 1 -0.108596 -0.0264031 0.00484744 0.0246415 0.0259453 0.0342257 -0.0303393 -0.181813 0.0458067 -0.0838585 -0.0664247 0.01289 0.0328792 -0.00230849 -0.197121 -0.215097 0.3258 0.65511 -0.0734558 0.0355184 -0.123023 -0.209715 0.142375 -0.464889 0.841099 -0.0120491 -0.00763357 nan nan nan nan 0.0337002 0.0104042 -0.00121411 -0.063428 -0.00476487 -0.0197588 0.114271 0.122157 -0.11625 -0.150926 0.0158534 -0.201499 -0.181344 0.130536 -0.135513 -0.0146131 -0.0215178 -0.00928085 -0.206963 -0.0403029 0.723753 -0.0690148 -0.00643908 0.0198315 0.00492423 0.0295645 -0.00544781
total_acc 0.190422 0.190422 0.190395 0.0880162 -0.0444001 0.16772 0.103768 0.15851 -0.0251697 nan 0.160071 0.112875 -0.265802 0.0252076 0.0252047 0.160403 0.164877 0.010897 0.711668 -0.00591755 0.185866 -0.108596 1 0.0340599 0.12161 -0.0146581 0.00482136 0.00074875 0.0288889 0.258755 0.373834 0.254223 0.348491 0.16105 0.407291 0.185937 0.257423 0.321715 0.121812 -0.00023709 0.16573 0.29089 0.213623 0.455349 0.0360446 0.244732 -0.0744188 0.0361676 0.0043422 nan nan nan nan 0.347698 0.0360243 0.126063 0.132395 0.153044 0.145121 0.304611 0.404457 0.412615 0.621164 0.69256 0.577508 0.761235 0.405051 0.708356 0.00882538 0.0229275 0.0656149 0.354305 0.0289515 -0.0646244 0.0234767 -0.0252072 0.0101585 0.0147019 -0.020617 -0.0080504
initial_list_status 0.0639881 0.0639881 0.0647964 0.176625 -0.162455 -0.00837457 0.0200416 0.0346802 -0.0257494 nan -0.0259364 -0.011731 -0.0406771 0.0770303 0.0770285 -0.0554694 -0.00998736 -0.0179055 0.0152771 -0.0174667 0.0219457 -0.0264031 0.0340599 1 -0.0163854 0.0156404 0.0196043 0.0126395 -0.000441343 -0.0436611 0.00142055 -0.0390542 -0.0274876 0.0124691 0.0131395 -0.0197974 -0.0463635 -0.044386 0.0388421 -0.0430698 -0.0301753 0.0112963 -0.0453963 -0.0452869 0.0415878 0.0553371 -0.028589 -0.000363524 -0.00318297 nan nan nan nan 0.0511063 0.0128764 -0.00611035 -0.0138002 -0.00568527 -0.0122598 0.00199673 -0.00581582 0.0205778 0.0304233 0.0147074 0.0106175 0.0249612 -0.00742326 0.0153477 0.000884112 -0.000503387 -0.0117098 -0.053438 0.0182647 -0.0279433 -0.0149067 -0.00594051 0.0176648 0.0160926 0.0132245 0.00580819
mths_since_last_major_derog -0.0380772 -0.0380772 -0.0381205 -0.0140784 0.0596923 -0.0264476 0.0245972 0.0116384 -0.0190358 nan -0.0273196 0.226492 -0.112574 -0.282168 -0.282164 0.0821566 0.570468 -0.000451319 0.0159066 0.0148058 -0.0746145 0.00484744 0.12161 -0.0163854 1 -0.0138163 -0.01033 -0.0145247 0.0355687 0.0797902 0.035763 0.0460466 0.0300871 0.049313 0.0439697 0.0624107 0.0929728 0.0899842 -0.115907 0.0576812 0.069348 0.00367465 0.0917672 0.0852758 0.0144104 -0.125023 0.0105724 0.122622 0.0333099 nan nan nan nan 0.066741 -0.00798391 0.383217 0.0621868 0.40456 0.584653 -0.0278289 0.0208443 -0.038223 0.0409552 0.0925125 -0.00827835 0.0716812 0.00823319 0.0135839 0.0468141 -0.00173777 0.271122 0.097447 -0.559342 -0.00136059 -0.0395159 0.019823 -0.020143 -0.00844678 -0.0127033 0.0311284
application_type 0.10254 0.10254 0.102532 0.0642611 0.0411864 0.0913125 -0.0747383 -0.0536792 0.131558 nan 0.189708 -0.00262258 0.0119714 0.0178947 0.0178956 -0.0223655 -0.00666193 -0.000572396 -0.0184166 -0.00585126 -0.00300515 0.0246415 -0.0146581 0.0156404 -0.0138163 1 0.900298 0.925836 -0.0022275 -0.0156001 -0.00278045 -0.013887 -0.0119551 -0.00150543 0.011471 -0.0107439 -0.0262963 -0.0262304 0.000584367 0.016174 0.0142427 0.0281037 0.00287337 -0.0214221 0.0427614 -0.0195873 0.019681 -0.00785714 -0.0028434 nan nan nan nan 0.0332519 -0.0190638 -0.00984521 0.00122771 -0.00934307 -0.00791016 -0.0311423 -0.0208105 -0.0395794 -0.0383825 -0.00106958 -0.0229003 -0.0268364 -0.0160854 -0.0177416 -0.00323222 0.000418642 -0.00540271 -0.0241895 0.00370765 0.0252186 0.00528841 -0.00976738 0.572076 0.504644 0.674253 0.446625
annual_inc_joint 0.128165 0.128165 0.12816 0.0643084 0.0263215 0.113794 -0.0584512 -0.0215331 0.123699 nan 0.167284 0.00268839 -0.00187587 0.0257033 0.025704 -0.0208437 -0.000829696 -0.00698251 -0.000557158 -0.00959029 0.0172646 0.0259453 0.00482136 0.0196043 -0.01033 0.900298 1 0.803658 -0.000826569 -0.0113082 0.00611445 -0.00859938 -0.00526274 0.00420784 0.0306911 -0.0107355 -0.0234255 -0.0238085 0.0234647 0.0138116 0.0154923 0.0326106 0.00616531 -0.0147903 0.0676954 -0.00159911 0.0183556 -0.00702118 -0.00256141 nan nan nan nan 0.0503214 -0.0127925 -0.00496629 0.00317881 -0.00351716 -0.00422056 -0.0160728 -0.00925781 -0.0229321 -0.0208823 0.00943032 -0.00966986 -0.0121454 -0.00449231 0.000173295 -0.0029595 0.0019704 -0.00378128 -0.0178815 0.0012872 0.0224391 -0.0023287 -0.00788305 0.634224 0.525145 0.628761 0.409268
dti_joint 0.0989231 0.0989231 0.0989095 0.0637183 0.0560887 0.0907054 -0.0694189 -0.0542628 0.12324 nan 0.231641 -0.00373691 0.00987256 0.010836 0.0108358 -0.0196721 -0.0068887 -0.00275022 -0.000784209 -0.00793729 0.00648542 0.0342257 0.00074875 0.0126395 -0.0145247 0.925836 0.803658 1 -0.0020982 -0.0118766 0.0116995 -0.00107468 0.00566236 0.00630324 0.0286964 0.00122126 -0.0239755 -0.0222214 0.00748965 0.0274741 0.0204248 0.036244 0.00762963 -0.0104125 0.0302414 -0.0224684 0.0295882 -0.00673212 -0.0026222 nan nan nan nan 0.0289413 -0.0170789 -0.0111498 0.00414795 -0.0104194 -0.00874896 -0.018663 -0.00485167 -0.029926 -0.0306313 0.0145116 -0.0107856 -0.0173175 0.000179617 -0.000113842 -0.0031668 0.000314872 -0.00559278 -0.0170224 0.00900865 0.0338727 0.00289668 -0.0104434 0.610817 0.473685 0.655014 0.406984
acc_now_delinq 0.0017271 0.0017271 0.00170044 -0.00242646 0.0065543 0.00338448 0.0108487 0.0128571 -0.00929717 nan -0.00105407 0.123617 -0.0314438 -0.0428563 -0.0428545 -0.00414494 0.0689108 -0.0054401 0.0224183 -0.00257833 0.00352412 -0.0303393 0.0288889 -0.000441343 0.0355687 -0.0022275 -0.000826569 -0.0020982 1 -0.00761881 0.007005 -0.00485305 -0.00670858 0.00667966 0.00867382 -0.00438624 -0.00527453 -0.00443329 -0.00264773 -0.0250433 -0.00524045 0.00434626 -0.00538684 -0.00690889 0.0139345 0.0184271 -0.0292296 0.0425989 0.203697 nan nan nan nan 0.0262094 -0.00634398 0.0495909 -0.000893305 0.062975 0.0182332 -0.00295717 0.00554845 0.00178831 0.0176981 0.00768579 0.0167498 0.0284013 0.00383753 0.0115189 0.40493 0.795964 0.0583675 -0.00717039 -0.0489672 -0.0262401 -0.0115422 0.00792196 -0.00329687 -0.00300911 -0.00518892 -0.00104258
open_acc_6m -0.0153681 -0.0153681 -0.0154125 0.0139953 0.170999 0.00464014 0.0226559 0.0373415 -0.0237732 nan 0.0302801 -0.000239808 -0.0118584 -0.0616149 -0.0616173 0.424702 0.0491603 0.0661928 0.281506 0.0568102 -0.0193387 -0.181813 0.258755 -0.0436611 0.0797902 -0.0156001 -0.0113082 -0.0118766 -0.00761881 1 0.0800516 0.38216 0.29324 0.0465493 0.11244 0.161022 0.616661 0.474521 -0.0835329 -0.0462331 0.153737 0.0913306 0.308557 0.549247 -0.0335654 0.099022 -0.160855 0.0018867 -0.00455355 nan nan nan nan 0.0499448 0.0257039 0.0328249 0.151626 0.035032 0.0597546 0.134118 0.206671 0.1967 0.191733 0.136626 0.277622 0.245263 0.19597 0.280231 -0.00618333 -0.00669984 0.0095132 0.720908 0.00667259 -0.151688 0.0558972 0.0152902 -0.0126559 -0.00728085 -0.0204226 -0.00799011
open_act_il 0.0304857 0.0304857 0.0304666 0.0343187 0.0333187 0.0273263 -0.0626386 0.0633142 -0.0104031 nan 0.173504 0.0640688 0.0583278 -0.0134152 -0.0134182 0.0306383 0.0527387 -0.0315764 0.51498 -0.028641 -0.0012892 0.0458067 0.373834 0.00142055 0.035763 -0.00278045 0.00611445 0.0116995 0.007005 0.0800516 1 0.270178 0.35514 0.147997 0.556975 0.426248 -0.0165188 -0.0177898 0.0102881 0.371163 0.0892446 0.0855759 0.0774622 0.164241 -0.0494235 -0.0312765 0.045203 -0.00194676 -0.00280131 nan nan nan nan -0.0309252 -0.00353267 0.0142301 0.0499328 0.0199633 0.0883981 -0.0070089 -0.0021399 -0.0173969 -0.0285025 0.630083 -0.00685534 -0.0186272 0.00133266 0.517462 0.000432986 0.00593036 0.0614521 0.12235 -0.0109671 0.0425085 -0.0286686 -0.0122823 0.00421186 -0.00247352 -0.00348556 -0.000702324
open_il_12m 0.00200697 0.00200697 0.00198669 0.0419519 0.19403 0.0181085 0.0474697 0.0797827 -0.0312628 nan 0.131945 -0.00885671 -0.00247454 -0.0216399 -0.0216427 0.145311 0.0388924 0.0486684 0.167873 0.0374344 -0.0352578 -0.0838585 0.254223 -0.0390542 0.0460466 -0.013887 -0.00859938 -0.00107468 -0.00485305 0.38216 0.270178 1 0.754878 0.125331 0.297526 0.381355 0.0636213 0.067018 -0.0637622 0.189743 0.259924 0.208412 0.320585 0.440723 0.0238506 0.00233076 -0.079635 -0.0027676 -0.00445873 nan nan nan nan 0.0575597 -0.0132939 0.0187393 0.12935 0.0155649 0.0379043 -0.0396431 -0.0164143 -0.00605558 0.0258083 0.346195 0.0223643 0.0443078 -0.0193469 0.168107 -0.00431037 -0.00401501 -0.00208664 0.561705 0.0264624 -0.0685178 0.0463047 0.00654293 -0.00521119 -0.00469154 -0.0112569 -0.00325605
open_il_24m 0.0315075 0.0315075 0.0314859 0.0627368 0.171914 0.0411935 0.0610861 0.102467 -0.00431884 nan 0.180339 -0.0258594 0.00875091 -0.017437 -0.0174411 0.117973 0.0373714 0.0540555 0.237871 0.0381032 -0.0244266 -0.0664247 0.348491 -0.0274876 0.0300871 -0.0119551 -0.00526274 0.00566236 -0.00670858 0.29324 0.35514 0.754878 1 0.165277 0.365932 0.396888 0.0518012 0.0782004 -0.0536387 0.208603 0.343601 0.2814 0.264266 0.574646 0.0240117 0.00714786 -0.0613072 -0.00237151 -0.00463704 nan nan nan nan 0.0783013 -0.0170853 0.0081386 0.151223 0.00441684 0.0280061 -0.0254716 0.000797204 0.000984645 0.0392207 0.470968 0.0505349 0.0642075 0.00639011 0.239377 -0.00534064 -0.00540845 -0.022563 0.431285 0.0651893 -0.0520546 0.0526413 0.00415751 -0.00377276 -0.00241752 -0.0122527 -0.00416827
mths_since_rcnt_il 0.0499328 0.0499328 0.0499276 0.0352736 -0.0102362 0.0420945 0.0523735 0.0499242 -0.00258569 nan 0.11519 0.0277418 -0.036872 -0.00255944 -0.00256083 0.0198792 0.0681927 0.0218196 0.108818 0.0166762 0.0215199 0.01289 0.16105 0.0124691 0.049313 -0.00150543 0.00420784 0.00630324 0.00667966 0.0465493 0.147997 0.125331 0.165277 1 0.134028 0.307842 -6.7887e-05 -0.000691562 0.0229712 0.100231 0.0744073 0.0835565 0.0620311 0.0881004 0.0497583 -0.00460383 0.01612 0.00873132 0.0020377 nan nan nan nan 0.0741814 -0.00596338 0.0249249 0.0588964 0.0323102 0.0352813 -0.00245439 0.0172447 0.00293162 0.0289562 0.184509 0.0271208 0.0518723 0.0171077 0.108734 0.00112023 0.00523758 0.0163838 0.0683615 -0.0239216 0.0133864 0.0212014 0.00247218 -0.000141324 0.00206934 -0.00470371 -0.00408784
total_bal_il 0.157087 0.157087 0.157077 0.0769668 0.0376085 0.146332 -0.00350992 0.201533 0.0028623 nan 0.192938 0.0485201 -0.0285505 0.0117294 0.0117265 0.0504319 0.0643094 -0.0264372 0.342034 -0.0223009 0.0925043 0.0328792 0.407291 0.0131395 0.0439697 0.011471 0.0306911 0.0286964 0.00867382 0.11244 0.556975 0.297526 0.365932 0.134028 1 0.345969 -0.000897383 -0.00296172 0.0971311 0.294235 0.145066 0.11999 0.149457 0.190607 0.191673 0.0400949 0.0233544 0.00109849 -0.000601912 nan nan nan nan 0.0811057 0.00422169 0.0201831 0.0801214 0.024244 0.0713369 0.0257781 0.0279362 0.0347752 0.042446 0.579082 0.0427055 0.0503017 0.0300477 0.34343 0.00390574 0.00531998 0.0370412 0.156345 0.00849149 0.0168712 -0.0283581 0.000789849 0.0222411 0.0140907 0.00881707 0.00681597
il_util -0.0383439 -0.0383439 -0.0383544 0.042335 0.125128 -0.0330059 -0.019046 0.00228085 -0.0295579 nan 0.165749 0.0141296 0.0601059 -0.0818219 -0.0818255 0.0717362 0.0429723 0.0319805 0.21282 0.0232084 -0.0592588 -0.00230849 0.185937 -0.0197974 0.0624107 -0.0107439 -0.0107355 0.00122126 -0.00438624 0.161022 0.426248 0.381355 0.396888 0.307842 0.345969 1 0.0298274 0.0387439 -0.0700049 0.511279 0.151635 0.0935839 0.149637 0.231345 -0.036966 -0.0735395 0.00819367 0.00013921 -0.00579825 nan nan nan nan -0.0222128 -0.00788238 0.0086735 0.113513 0.0106431 0.0716097 -0.0312738 -0.00813602 -0.0307332 -0.0396905 0.328646 -0.0140658 -0.0209398 -0.0139502 0.2131 -0.00287632 -0.00583002 0.0332349 0.216616 -0.0211727 0.00629931 0.0326699 -0.00220551 -0.0121975 -0.0122938 -0.0133814 -0.00771851
open_rv_12m -0.034697 -0.034697 -0.0347373 -0.0146462 0.150425 -0.00946319 0.00905316 -0.00400232 -0.0136302 nan -0.000557731 -0.0234147 0.00055375 -0.109483 -0.109485 0.341016 0.0417107 0.093185 0.369619 0.0807145 -0.00142028 -0.197121 0.257423 -0.0463635 0.0929728 -0.0262963 -0.0234255 -0.0239755 -0.00527453 0.616661 -0.0165188 0.0636213 0.0518012 -6.7887e-05 -0.000897383 0.0298274 1 0.773266 -0.0893363 -0.161308 0.0942083 0.00654725 0.311212 0.648032 -0.149873 0.130773 -0.15384 -0.000782398 -0.00272689 nan nan nan nan -0.0264225 0.0543993 0.0314165 0.158242 0.0336248 0.070966 0.273071 0.369663 0.332636 0.300954 0.0124119 0.455826 0.376212 0.356516 0.367311 -0.00472218 -0.00491244 -0.000297686 0.837077 0.00695222 -0.155791 0.0796004 0.0217304 -0.0186303 -0.0179965 -0.0269882 -0.0106544
open_rv_24m -0.017261 -0.017261 -0.0173085 -0.00613441 0.158809 0.00898484 0.00405258 -0.0103785 -0.0139293 nan 0.0239507 -0.0489541 0.024354 -0.133455 -0.133456 0.298656 0.0353053 0.122713 0.473997 0.103701 0.00277935 -0.215097 0.321715 -0.044386 0.0899842 -0.0262304 -0.0238085 -0.0222214 -0.00443329 0.474521 -0.0177898 0.067018 0.0782004 -0.000691562 -0.00296172 0.0387439 0.773266 1 -0.103536 -0.175801 0.13306 0.0131964 0.297171 0.842025 -0.196518 0.147483 -0.152172 0.00408837 -0.00160363 nan nan nan nan -0.0467358 0.0640103 0.02146 0.199499 0.0255257 0.0694876 0.346665 0.461232 0.412793 0.377464 0.0117532 0.584674 0.476877 0.45121 0.471632 -0.00401581 -0.00452248 -0.0229226 0.654754 0.0397782 -0.153979 0.111202 0.0252402 -0.020129 -0.0217354 -0.0307799 -0.0121039
max_bal_bc 0.353818 0.353818 0.353833 0.0903663 -0.0424092 0.331862 0.0713191 0.24945 -0.0016616 nan 0.0851506 -0.0515156 -0.190607 0.0460722 0.0460666 -0.0542851 -0.0817066 -0.144157 0.106785 -0.114796 0.546885 0.3258 0.121812 0.0388421 -0.115907 0.000584367 0.0234647 0.00748965 -0.00264773 -0.0835329 0.0102881 -0.0637622 -0.0536387 0.0229712 0.0971311 -0.0700049 -0.0893363 -0.103536 1 0.131186 -0.0786183 -0.0194046 -0.078415 -0.0993511 0.255621 0.177813 0.305404 -0.016073 -0.00398507 nan nan nan nan 0.212922 0.108332 -0.0603317 -0.0491943 -0.062233 -0.0997827 0.217399 0.124489 0.206914 0.182499 0.0141902 0.0895558 0.116729 0.11929 0.106525 -0.00680483 0.00441199 -0.0462697 -0.0968394 0.128772 0.217872 -0.129389 -0.0217796 0.0469201 0.0172944 0.00881723 -0.0134969
all_util 0.0146875 0.0146875 0.0146675 0.050348 0.25345 0.0367535 -0.0147168 0.0067374 -0.0746282 nan 0.170675 0.0213504 0.0536393 -0.416169 -0.41617 -0.048355 0.053505 -0.0136064 -0.0123935 -0.00883367 0.109251 0.65511 -0.00023709 -0.0430698 0.0576812 0.016174 0.0138116 0.0274741 -0.0250433 -0.0462331 0.371163 0.189743 0.208603 0.100231 0.294235 0.511279 -0.161308 -0.175801 0.131186 1 0.0754531 0.062601 0.0274413 -0.0408526 0.0913201 -0.47187 0.565641 -0.00831581 -0.00685925 nan nan nan nan -0.0244293 -0.0292938 0.018079 0.0191668 0.0149528 0.0760154 -0.0328463 -0.0213858 -0.2125 -0.220668 0.256207 -0.249764 -0.228374 -0.0026557 -0.0110648 -0.00707463 -0.0217334 0.0324189 -0.0402328 -0.0762987 0.478858 -0.0158144 -0.00264618 0.00676926 -0.00215361 0.016229 -0.00191463
inq_fi 0.00764786 0.00764786 0.00763789 0.027585 0.145232 0.0209666 0.0092254 0.053162 0.0260245 nan 0.055609 0.0229035 0.0292811 -0.0815677 -0.0815686 0.218088 0.073993 0.0762482 0.108395 0.0674735 -0.0468321 -0.0734558 0.16573 -0.0301753 0.069348 0.0142427 0.0154923 0.0204248 -0.00524045 0.153737 0.0892446 0.259924 0.343601 0.0744073 0.145066 0.151635 0.0942083 0.13306 -0.0786183 0.0754531 1 0.0898335 0.563693 0.302884 0.0517894 -0.00437465 -0.0618109 0.0107957 -0.000685353 nan nan nan nan 0.090355 0.00264659 0.041065 0.211763 0.0369582 0.0677987 -0.000722646 0.00851313 0.0132135 0.049748 0.173357 0.0622347 0.0629156 0.021789 0.109912 -0.00292626 -0.0065604 0.00873659 0.228857 -0.017259 -0.0614376 0.0731705 0.0202849 0.00407757 0.0118063 0.00628767 0.0133634
total_cu_tl 0.0695528 0.0695528 0.0695417 0.0538909 0.0204437 0.0597467 0.0834455 0.0419613 0.00432069 nan 0.09578 0.0172532 -0.0403152 -0.00267085 -0.00267086 0.0313672 0.0386346 0.00847277 0.101001 -0.00157323 0.0357439 0.0355184 0.29089 0.0112963 0.00367465 0.0281037 0.0326106 0.036244 0.00434626 0.0913306 0.0855759 0.208412 0.2814 0.0835565 0.11999 0.0935839 0.00654725 0.0131964 -0.0194046 0.062601 0.0898335 1 0.0789875 0.165896 0.0591046 -0.0162122 -0.0169612 0.00400361 0.00055474 nan nan nan nan 0.165511 -0.0855086 0.000451111 0.0373086 0.025179 -0.00219945 -0.0817919 0.0158443 -0.0685984 0.00226729 0.304966 0.0507981 0.112094 0.0266929 0.101881 0.00140141 0.00234959 -0.00263983 0.121665 0.0470462 -0.000464966 0.0172131 -0.014358 0.0245087 0.0345538 0.0207548 0.00824441
inq_last_12m 0.0164975 0.0164975 0.0164552 0.0303021 0.1967 0.037195 0.000861288 0.064537 -0.00388903 nan 0.0308119 0.0264162 0.0223407 -0.126787 -0.126788 0.484791 0.074587 0.105516 0.184489 0.0904722 -0.0253142 -0.123023 0.213623 -0.0453963 0.0917672 0.00287337 0.00616531 0.00762963 -0.00538684 0.308557 0.0774622 0.320585 0.264266 0.0620311 0.149457 0.149637 0.311212 0.297171 -0.078415 0.0274413 0.563693 0.0789875 1 0.398087 0.0410276 0.0248997 -0.101845 0.00925652 -0.003314 nan nan nan nan 0.0974853 0.0163791 0.0450641 0.283332 0.0430992 0.0777948 0.0551404 0.0958435 0.0992322 0.120394 0.148112 0.159422 0.155754 0.0933912 0.183727 -0.00495738 -0.00562695 0.0160847 0.448994 -0.0192305 -0.103555 0.104239 0.0228439 -0.00750618 0.00157327 -0.00676538 0.00452644
acc_open_past_24mths 0.019626 0.019626 0.0195828 0.0368651 0.208535 0.0424672 0.038457 0.0610748 -0.00624408 nan 0.101628 -0.0585419 0.021144 -0.10921 -0.109214 0.303038 0.047586 0.123906 0.51123 0.0998176 -0.00365459 -0.209715 0.455349 -0.0452869 0.0852758 -0.0214221 -0.0147903 -0.0104125 -0.00690889 0.549247 0.164241 0.440723 0.574646 0.0881004 0.190607 0.231345 0.648032 0.842025 -0.0993511 -0.0408526 0.302884 0.165896 0.398087 1 -0.0803721 0.133892 -0.15783 0.00283212 -0.00440563 nan nan nan nan 0.0719896 0.0427255 0.0199098 0.256422 0.020815 0.0679818 0.264383 0.368763 0.335204 0.328338 0.2571 0.496488 0.42004 0.363311 0.510226 -0.00651912 -0.00613127 -0.0341718 0.771942 0.0729495 -0.154705 0.114637 0.0208618 -0.0135357 -0.00818275 -0.0270999 -0.0106693
avg_cur_bal 0.215998 0.215998 0.216055 0.0757197 -0.0883193 0.184681 0.0955353 0.287459 0.0272625 nan -0.0724721 0.0317852 -0.118321 0.10444 0.104442 -0.0440917 0.0490007 -0.0736874 -0.124814 -0.0579094 0.281683 0.142375 0.0360446 0.0415878 0.0144104 0.0427614 0.0676954 0.0302414 0.0139345 -0.0335654 -0.0494235 0.0238506 0.0240117 0.0497583 0.191673 -0.036966 -0.149873 -0.196518 0.255621 0.0913201 0.0517894 0.0591046 0.0410276 -0.0803721 1 0.028102 0.0720545 0.00425261 0.0225698 nan nan nan nan 0.462622 -0.0506585 0.0296503 0.0319576 0.0231014 0.013306 -0.113039 -0.149751 -0.118107 -0.0580143 0.0448754 -0.200781 -0.0945567 -0.153349 -0.125071 0.00738516 0.013113 0.00446792 -0.0495513 -0.0435262 0.0709961 -0.0807372 -0.00148414 0.0469711 0.0618051 0.0416053 0.0160454
bc_open_to_buy 0.188939 0.188939 0.188967 0.0209426 -0.244328 0.150411 0.0212737 0.153362 0.0629564 nan -0.063867 -0.061073 -0.140977 0.507928 0.507929 0.0318196 -0.111267 -0.113859 0.305943 -0.0907413 0.150496 -0.464889 0.244732 0.0553371 -0.125023 -0.0195873 -0.00159911 -0.0224684 0.0184271 0.099022 -0.0312765 0.00233076 0.00714786 -0.00460383 0.0400949 -0.0735395 0.130773 0.147483 0.177813 -0.47187 -0.00437465 -0.0162122 0.0248997 0.133892 0.028102 1 -0.500072 -0.0114597 0.00159527 nan nan nan nan 0.133413 0.0748705 -0.063607 0.00754838 -0.0748285 -0.0818461 0.245744 0.105946 0.484976 0.459524 -0.00451172 0.371504 0.331844 0.122936 0.309204 0.00566647 0.0168384 -0.0442112 0.117169 0.165385 -0.411689 -0.0997203 -0.0197553 0.0149981 0.00844124 -0.0193241 -0.00872533
bc_util 0.0761079 0.0761079 0.0760853 0.0446585 0.210012 0.098197 0.0420823 0.0121136 -0.0728878 nan 0.150757 -0.00470567 -0.0234079 -0.470744 -0.470741 -0.0897593 0.0182409 -0.0168851 -0.0741913 -0.0153993 0.191461 0.841099 -0.0744188 -0.028589 0.0105724 0.019681 0.0183556 0.0295882 -0.0292296 -0.160855 0.045203 -0.079635 -0.0613072 0.01612 0.0233544 0.00819367 -0.15384 -0.152172 0.305404 0.565641 -0.0618109 -0.0169612 -0.101845 -0.15783 0.0720545 -0.500072 1 -0.00983678 -0.00690652 nan nan nan nan 0.0105076 0.197854 -0.000993264 -0.0406667 0.00334719 -0.0165676 0.117683 0.154295 -0.128802 -0.153372 0.0135313 -0.124005 -0.123005 0.161241 -0.0750139 -0.012314 -0.0213142 -0.0120727 -0.171062 -0.0324937 0.845272 -0.018847 -0.00370829 0.0106302 -0.00364648 0.0180859 -0.00833916
chargeoff_within_12_mths -0.00195514 -0.00195514 -0.00196807 -0.00225203 0.00647974 -0.000521834 0.00673721 0.00481134 0.00153744 nan -0.00259618 0.152355 -0.0266618 -0.0528395 -0.052837 0.0102862 0.0702054 -0.011433 0.005978 -0.00895587 -0.00967216 -0.0120491 0.0361676 -0.000363524 0.122622 -0.00785714 -0.00702118 -0.00673212 0.0425989 0.0018867 -0.00194676 -0.0027676 -0.00237151 0.00873132 0.00109849 0.00013921 -0.000782398 0.00408837 -0.016073 -0.00831581 0.0107957 0.00400361 0.00925652 0.00283212 0.00425261 -0.0114597 -0.00983678 1 0.0105278 nan nan nan nan 0.019213 0.000453322 0.0558066 0.00759287 0.0591205 0.114215 -0.00658847 0.0038822 -0.00124331 0.0293433 0.00961118 0.00677495 0.038421 0.00106656 0.00553998 0.0359113 0.000856444 0.227076 -0.00218708 -0.0806482 -0.0101618 -0.0154123 -0.000991514 -0.0034651 -0.00369874 -0.00350607 0.0031394
delinq_amnt 0.00276253 0.00276253 0.00274976 0.000162738 0.00431558 0.00357904 0.00203488 0.00818791 -0.00383244 nan -0.00724449 0.035054 -0.00867312 -0.0151574 -0.0151572 -0.00224899 0.0207741 0.000426055 0.00331224 0.00281559 0.00223265 -0.00763357 0.0043422 -0.00318297 0.0333099 -0.0028434 -0.00256141 -0.0026222 0.203697 -0.00455355 -0.00280131 -0.00445873 -0.00463704 0.0020377 -0.000601912 -0.00579825 -0.00272689 -0.00160363 -0.00398507 -0.00685925 -0.000685353 0.00055474 -0.003314 -0.00440563 0.0225698 0.00159527 -0.00690652 0.0105278 1 nan nan nan nan 0.0150131 -0.00164215 0.00571492 -0.0035332 0.00885733 0.0220937 -0.001736 0.00036987 -0.000381842 0.00136466 -0.00188584 0.00155509 0.00386748 -0.000938581 -0.00123076 0.367772 0.0381949 0.0505135 -0.00524767 -0.0171228 -0.00763145 -0.0029183 0.0054966 -0.00196265 -0.00178863 -0.00237695 -0.00176629
mo_sin_old_il_acct nan nan nan nan nan nan nan nan nan nan nan nan nan nan nan nan nan nan nan nan nan nan nan nan nan nan nan nan nan nan nan nan nan nan nan nan nan nan nan nan nan nan nan nan nan nan nan nan nan nan nan nan nan nan nan nan nan nan nan nan nan nan nan nan nan nan nan nan nan nan nan nan nan nan nan nan nan nan nan nan
mo_sin_old_rev_tl_op nan nan nan nan nan nan nan nan nan nan nan nan nan nan nan nan nan nan nan nan nan nan nan nan nan nan nan nan nan nan nan nan nan nan nan nan nan nan nan nan nan nan nan nan nan nan nan nan nan nan nan nan nan nan nan nan nan nan nan nan nan nan nan nan nan nan nan nan nan nan nan nan nan nan nan nan nan nan nan nan
mo_sin_rcnt_rev_tl_op nan nan nan nan nan nan nan nan nan nan nan nan nan nan nan nan nan nan nan nan nan nan nan nan nan nan nan nan nan nan nan nan nan nan nan nan nan nan nan nan nan nan nan nan nan nan nan nan nan nan nan nan nan nan nan nan nan nan nan nan nan nan nan nan nan nan nan nan nan nan nan nan nan nan nan nan nan nan nan nan
mo_sin_rcnt_tl nan nan nan nan nan nan nan nan nan nan nan nan nan nan nan nan nan nan nan nan nan nan nan nan nan nan nan nan nan nan nan nan nan nan nan nan nan nan nan nan nan nan nan nan nan nan nan nan nan nan nan nan nan nan nan nan nan nan nan nan nan nan nan nan nan nan nan nan nan nan nan nan nan nan nan nan nan nan nan nan
mort_acc 0.212504 0.212504 0.212543 0.0901313 -0.111063 0.175254 0.158505 0.198136 -0.0154189 nan -0.0128473 0.0571907 -0.27796 0.104666 0.104667 0.0126668 0.107099 -0.0264072 0.128376 -0.0255759 0.216294 0.0337002 0.347698 0.0511063 0.066741 0.0332519 0.0503214 0.0289413 0.0262094 0.0499448 -0.0309252 0.0575597 0.0783013 0.0741814 0.0811057 -0.0222128 -0.0264225 -0.0467358 0.212922 -0.0244293 0.090355 0.165511 0.0974853 0.0719896 0.462622 0.133413 0.0105076 0.019213 0.0150131 nan nan nan nan 1 0.000482828 0.0538963 0.0706693 0.0616658 0.0442576 0.0428493 0.0609731 0.0794778 0.180937 0.0846755 0.0840111 0.211166 0.0582078 0.126177 0.0097598 0.0217753 0.0132967 0.0666973 -0.0144858 0.0126638 -0.0174074 -0.0178562 0.0361079 0.0868426 0.0186783 0.00742479
mths_since_recent_bc 0.0462264 0.0462264 0.0462363 0.0152259 -0.0289126 0.0421676 0.00463817 0.0176033 -0.00796961 nan 0.00830909 -0.0158934 -0.00855518 0.00784537 0.00784578 0.0173994 -0.0147326 0.00549048 0.0907996 0.005651 0.0414679 0.0104042 0.0360243 0.0128764 -0.00798391 -0.0190638 -0.0127925 -0.0170789 -0.00634398 0.0257039 -0.00353267 -0.0132939 -0.0170853 -0.00596338 0.00422169 -0.00788238 0.0543993 0.0640103 0.108332 -0.0292938 0.00264659 -0.0855086 0.0163791 0.0427255 -0.0506585 0.0748705 0.197854 0.000453322 -0.00164215 nan nan nan nan 0.000482828 1 -0.0010856 0.016384 -0.0138369 -0.0130127 0.165875 0.103923 0.163902 0.116076 -0.0184111 0.110806 0.069867 0.105189 0.0908174 -0.0011123 -0.0060109 -0.0106523 0.0373853 0.0315371 0.116542 0.00425672 0.00565532 -0.00607725 -0.0114841 -0.0225751 -0.014197
mths_since_recent_bc_dlq -0.0068254 -0.0068254 -0.00685981 -0.00936181 0.0239265 -0.000387367 0.0261163 0.0295602 -0.0223056 nan -0.0249932 0.317727 -0.120467 -0.202268 -0.202266 0.0349001 0.550657 -0.05204 0.0272822 -0.0322887 -0.0326004 -0.00121411 0.126063 -0.00611035 0.383217 -0.00984521 -0.00496629 -0.0111498 0.0495909 0.0328249 0.0142301 0.0187393 0.0081386 0.0249249 0.0201831 0.0086735 0.0314165 0.02146 -0.0603317 0.018079 0.041065 0.000451111 0.0450641 0.0199098 0.0296503 -0.063607 -0.000993264 0.0558066 0.00571492 nan nan nan nan 0.0538963 -0.0010856 1 0.0289446 0.777202 0.319289 -0.01046 0.00367685 -0.0108362 0.14231 0.038387 0.0165989 0.136835 0.00477741 0.0263332 0.0124995 0.045094 0.114447 0.0341549 -0.520245 -0.00849793 -0.0823344 0.0113915 -0.0135996 -0.00428139 -0.00994155 0.016833
mths_since_recent_inq 0.0138877 0.0138877 0.0138697 0.0248917 0.095931 0.0207774 0.00435118 0.0316428 0.000616289 nan 0.0291981 0.0128847 0.019383 -0.094217 -0.0942215 0.209879 0.0537591 0.0695888 0.128645 0.0573158 -0.00693511 -0.063428 0.132395 -0.0138002 0.0621868 0.00122771 0.00317881 0.00414795 -0.000893305 0.151626 0.0499328 0.12935 0.151223 0.0588964 0.0801214 0.113513 0.158242 0.199499 -0.0491943 0.0191668 0.211763 0.0373086 0.283332 0.256422 0.0319576 0.00754838 -0.0406667 0.00759287 -0.0035332 nan nan nan nan 0.0706693 0.016384 0.0289446 1 0.0319992 0.0444397 0.0569204 0.090525 0.0780788 0.079089 0.0785013 0.112131 0.107386 0.0857591 0.127483 -0.00225567 -0.00016021 0.00881176 0.211114 -0.017555 -0.0454136 0.0656321 0.0119967 -0.00719881 0.00214206 -0.00855814 -0.000433285
mths_since_recent_revol_delinq -0.00389924 -0.00389924 -0.00393152 -0.0117617 0.0266176 0.00415115 0.030035 0.0280972 -0.0197745 nan -0.0196061 0.352637 -0.132552 -0.236348 -0.236346 0.0338875 0.707576 -0.0663142 0.0595541 -0.0431191 -0.0274949 -0.00476487 0.153044 -0.00568527 0.40456 -0.00934307 -0.00351716 -0.0104194 0.062975 0.035032 0.0199633 0.0155649 0.00441684 0.0323102 0.024244 0.0106431 0.0336248 0.0255257 -0.062233 0.0149528 0.0369582 0.025179 0.0430992 0.020815 0.0231014 -0.0748285 0.00334719 0.0591205 0.00885733 nan nan nan nan 0.0616658 -0.0138369 0.777202 0.0319992 1 0.302119 -0.0252805 0.0250356 -0.0238033 0.108153 0.0488154 0.050745 0.165765 0.0252108 0.0581276 0.0164159 0.0572876 0.116126 0.0339703 -0.550831 -0.00396584 -0.0982129 0.00937857 -0.0130986 -0.0033705 -0.00786351 0.0195779
num_accts_ever_120_pd -0.043254 -0.043254 -0.0432889 -0.0171136 0.0362125 -0.0346932 0.00493574 0.00959516 -0.00689764 nan -0.0274789 0.216559 -0.0854754 -0.194331 -0.194328 0.0462168 0.362446 -0.0108138 0.0314553 0.00321878 -0.070373 -0.0197588 0.145121 -0.0122598 0.584653 -0.00791016 -0.00422056 -0.00874896 0.0182332 0.0597546 0.0883981 0.0379043 0.0280061 0.0352813 0.0713369 0.0716097 0.070966 0.0694876 -0.0997827 0.0760154 0.0677987 -0.00219945 0.0777948 0.0679818 0.013306 -0.0818461 -0.0165676 0.114215 0.0220937 nan nan nan nan 0.0442576 -0.0130127 0.319289 0.0444397 0.302119 1 -0.0440399 -0.0241404 -0.0627028 0.0522256 0.145643 -0.0189137 0.0642757 -0.0157037 0.0322784 0.0322363 -0.00596655 0.317129 0.0752611 -0.589166 -0.0238768 -0.0468898 0.00973285 -0.0138636 -0.00630334 -0.00817286 0.0199888
num_actv_bc_tl 0.199607 0.199607 0.199572 0.0527023 0.0534045 0.20317 0.0758387 0.106956 -0.0152928 nan 0.126229 -0.0328287 -0.116417 -0.120806 -0.120807 0.101559 -0.0361648 -0.0466023 0.551326 -0.029659 0.309452 0.114271 0.304611 0.00199673 -0.0278289 -0.0311423 -0.0160728 -0.018663 -0.00295717 0.134118 -0.0070089 -0.0396431 -0.0254716 -0.00245439 0.0257781 -0.0312738 0.273071 0.346665 0.217399 -0.0328463 -0.000722646 -0.0817919 0.0551404 0.264383 -0.113039 0.245744 0.117683 -0.00658847 -0.001736 nan nan nan nan 0.0428493 0.165875 -0.01046 0.0569204 -0.0252805 -0.0440399 1 0.820425 0.82043 0.595364 -0.0194805 0.650644 0.45893 0.808971 0.548949 -0.00628703 0.000230749 -0.029503 0.197673 0.109193 0.0581039 -0.056417 0.0133264 0.00282292 -0.016675 -0.0271403 -0.0203711
num_actv_rev_tl 0.161251 0.161251 0.161207 0.0559652 0.0947076 0.168627 0.111782 0.0792329 -0.032089 nan 0.181384 0.00268074 -0.15247 -0.19519 -0.19519 0.156783 0.0130495 0.00624367 0.661589 0.00564454 0.309683 0.122157 0.404457 -0.00581582 0.0208443 -0.0208105 -0.00925781 -0.00485167 0.00554845 0.206671 -0.0021399 -0.0164143 0.000797204 0.0172447 0.0279362 -0.00813602 0.369663 0.461232 0.124489 -0.0213858 0.00851313 0.0158443 0.0958435 0.368763 -0.149751 0.105946 0.154295 0.0038822 0.00036987 nan nan nan nan 0.0609731 0.103923 0.00367685 0.090525 0.0250356 -0.0241404 0.820425 1 0.666762 0.489308 0.014075 0.776673 0.574335 0.975518 0.656357 -0.00478624 0.00764551 -0.0145671 0.286416 0.084511 0.111197 -0.00444145 0.00899694 0.0039281 -0.0146125 -0.0214485 -0.0174936
num_bc_sats 0.220138 0.220138 0.220116 0.0602925 -0.0243449 0.209793 0.0688841 0.123213 0.00675165 nan 0.0754692 -0.026932 -0.13836 0.0642446 0.0642422 0.15623 -0.0490637 -0.0602441 0.6326 -0.044667 0.279112 -0.11625 0.412615 0.0205778 -0.038223 -0.0395794 -0.0229321 -0.029926 0.00178831 0.1967 -0.0173969 -0.00605558 0.000984645 0.00293162 0.0347752 -0.0307332 0.332636 0.412793 0.206914 -0.2125 0.0132135 -0.0685984 0.0992322 0.335204 -0.118107 0.484976 -0.128802 -0.00124331 -0.000381842 nan nan nan nan 0.0794778 0.163902 -0.0108362 0.0780788 -0.0238033 -0.0627028 0.82043 0.666762 1 0.754651 -0.00595212 0.749365 0.598343 0.637555 0.628036 -0.00275183 0.00194617 -0.0238053 0.267746 0.13986 -0.133136 -0.0644459 0.00332998 0.000639426 -0.0159482 -0.0355712 -0.022067
num_bc_tl 0.199425 0.199425 0.199397 0.0500086 -0.0745048 0.184321 0.0899515 0.128081 -0.0233043 nan 0.0552756 0.0308758 -0.263238 0.0737801 0.0737779 0.141211 0.0499351 -0.0142455 0.5457 -0.0180016 0.232235 -0.150926 0.621164 0.0304233 0.0409552 -0.0383825 -0.0208823 -0.0306313 0.0176981 0.191733 -0.0285025 0.0258083 0.0392207 0.0289562 0.042446 -0.0396905 0.300954 0.377464 0.182499 -0.220668 0.049748 0.00226729 0.120394 0.328338 -0.0580143 0.459524 -0.153372 0.0293433 0.00136466 nan nan nan nan 0.180937 0.116076 0.14231 0.079089 0.108153 0.0522256 0.595364 0.489308 0.754651 1 0.0427789 0.651757 0.836398 0.487024 0.54168 0.00525794 0.0146068 0.0110842 0.260238 0.0655869 -0.145449 -0.00593851 -0.010921 -0.00109308 -0.00938568 -0.0375617 -0.0194202
num_il_tl 0.0729088 0.0729088 0.0728883 0.0653109 0.0154559 0.0598414 0.00275573 0.0929922 -0.00598991 nan 0.157749 0.082474 -0.0181363 -0.0171835 -0.0171868 0.0668369 0.120277 -0.0135545 0.38191 -0.0178215 0.0125949 0.0158534 0.69256 0.0147074 0.0925125 -0.00106958 0.00943032 0.0145116 0.00768579 0.136626 0.630083 0.346195 0.470968 0.184509 0.579082 0.328646 0.0124119 0.0117532 0.0141902 0.256207 0.173357 0.304966 0.148112 0.2571 0.0448754 -0.00451172 0.0135313 0.00961118 -0.00188584 nan nan nan nan 0.0846755 -0.0184111 0.038387 0.0785013 0.0488154 0.145643 -0.0194805 0.014075 -0.00595212 0.0427789 1 0.0458202 0.0864782 0.0192931 0.383402 0.00119655 0.00493631 0.0727621 0.194377 -0.00339816 0.0173363 -0.00935613 -0.0159769 0.00496934 0.00691429 -0.00402448 0.000384541
num_op_rev_tl 0.160412 0.160412 0.160384 0.0507575 0.00648551 0.156451 0.0982971 0.0782448 -0.00992501 nan 0.125151 0.0110338 -0.169198 0.0134573 0.0134556 0.175681 0.0219983 0.0195522 0.840337 0.0107301 0.226513 -0.201499 0.577508 0.0106175 -0.00827835 -0.0229003 -0.00966986 -0.0107856 0.0167498 0.277622 -0.00685534 0.0223643 0.0505349 0.0271208 0.0427055 -0.0140658 0.455826 0.584674 0.0895558 -0.249764 0.0622347 0.0507981 0.159422 0.496488 -0.200781 0.371504 -0.124005 0.00677495 0.00155509 nan nan nan nan 0.0840111 0.110806 0.0165989 0.112131 0.050745 -0.0189137 0.650644 0.776673 0.749365 0.651757 0.0458202 1 0.794653 0.782106 0.839103 0.00199449 0.0154537 -0.0225507 0.379347 0.130167 -0.123017 0.0181009 0.000311432 0.00652393 -0.00947076 -0.0276161 -0.0144157
num_rev_accts 0.163052 0.163052 0.163024 0.0501363 -0.0537557 0.149827 0.11622 0.0984909 -0.0268819 nan 0.097101 0.0751926 -0.307887 0.0295203 0.0295188 0.172687 0.103071 0.035464 0.671288 0.0135757 0.215664 -0.181344 0.761235 0.0249612 0.0716812 -0.0268364 -0.0121454 -0.0173175 0.0284013 0.245263 -0.0186272 0.0443078 0.0642075 0.0518723 0.0503017 -0.0209398 0.376212 0.476877 0.116729 -0.228374 0.0629156 0.112094 0.155754 0.42004 -0.0945567 0.331844 -0.123005 0.038421 0.00386748 nan nan nan nan 0.211166 0.069867 0.136835 0.107386 0.165765 0.0642757 0.45893 0.574335 0.598343 0.836398 0.0864782 0.794653 1 0.570161 0.665393 0.00804668 0.0240981 0.0238512 0.329466 0.0533712 -0.112284 0.0486757 -0.0196402 0.00282971 -0.00336047 -0.0300403 -0.0132694
num_rev_tl_bal_gt_0 0.157129 0.157129 0.157086 0.0528589 0.0955494 0.165577 0.112049 0.077653 -0.0363046 nan 0.184929 -0.00182585 -0.149252 -0.195455 -0.195455 0.129672 0.0165125 0.00290667 0.663984 0.00396216 0.308051 0.130536 0.405051 -0.00742326 0.00823319 -0.0160854 -0.00449231 0.000179617 0.00383753 0.19597 0.00133266 -0.0193469 0.00639011 0.0171077 0.0300477 -0.0139502 0.356516 0.45121 0.11929 -0.0026557 0.021789 0.0266929 0.0933912 0.363311 -0.153349 0.122936 0.161241 0.00106656 -0.000938581 nan nan nan nan 0.0582078 0.105189 0.00477741 0.0857591 0.0252108 -0.0157037 0.808971 0.975518 0.637555 0.487024 0.0192931 0.782106 0.570161 1 0.662889 -0.00403072 0.00647178 -0.0221875 0.273814 0.0898938 0.117669 -0.00757742 0.010145 0.00735437 -0.0124839 -0.0186286 -0.0158036
num_sats 0.176213 0.176213 0.176186 0.0707827 0.0108543 0.16713 0.0659711 0.125925 -0.0163005 nan 0.195886 0.049985 -0.133176 0.0240883 0.0240852 0.163325 0.0554331 -0.0173759 0.998662 -0.019072 0.215405 -0.135513 0.708356 0.0153477 0.0135839 -0.0177416 0.000173295 -0.000113842 0.0115189 0.280231 0.517462 0.168107 0.239377 0.108734 0.34343 0.2131 0.367311 0.471632 0.106525 -0.0110648 0.109912 0.101881 0.183727 0.510226 -0.125071 0.309204 -0.0750139 0.00553998 -0.00123076 nan nan nan nan 0.126177 0.0908174 0.0263332 0.127483 0.0581276 0.0322784 0.548949 0.656357 0.628036 0.54168 0.383402 0.839103 0.665393 0.662889 1 0.000914236 0.00987463 0.0141552 0.389719 0.10137 -0.0758054 -0.0192534 -0.00727364 0.0113808 -0.00345781 -0.0229484 -0.0119976
num_tl_120dpd_2m 0.0013363 0.0013363 0.00131546 -0.00132722 0.00235276 0.00192625 0.00146513 0.00766741 -0.00344101 nan -0.00494301 0.0472085 -0.0106247 -0.0180953 -0.018095 -0.00281417 0.0287721 0.00103694 0.00466632 0.00475399 -0.00285345 -0.0146131 0.00882538 0.000884112 0.0468141 -0.00323222 -0.0029595 -0.0031668 0.40493 -0.00618333 0.000432986 -0.00431037 -0.00534064 0.00112023 0.00390574 -0.00287632 -0.00472218 -0.00401581 -0.00680483 -0.00707463 -0.00292626 0.00140141 -0.00495738 -0.00651912 0.00738516 0.00566647 -0.012314 0.0359113 0.367772 nan nan nan nan 0.0097598 -0.0011123 0.0124995 -0.00225567 0.0164159 0.0322363 -0.00628703 -0.00478624 -0.00275183 0.00525794 0.00119655 0.00199449 0.00804668 -0.00403072 0.000914236 1 0.00108827 0.0728438 -0.00668362 -0.0202917 -0.0136826 -0.00218899 0.00807619 -0.00224183 -0.00236635 -0.00320265 -0.000394056
num_tl_30dpd 0.00312459 0.00312459 0.00310217 -0.00163741 0.00567208 0.00431939 0.0105402 0.0105974 -0.00769706 nan 0.001442 0.103201 -0.0273597 -0.0321727 -0.0321722 -0.00666443 0.0562023 -0.00998001 0.0196159 -0.00658831 0.0069155 -0.0215178 0.0229275 -0.000503387 -0.00173777 0.000418642 0.0019704 0.000314872 0.795964 -0.00669984 0.00593036 -0.00401501 -0.00540845 0.00523758 0.00531998 -0.00583002 -0.00491244 -0.00452248 0.00441199 -0.0217334 -0.0065604 0.00234959 -0.00562695 -0.00613127 0.013113 0.0168384 -0.0213142 0.000856444 0.0381949 nan nan nan nan 0.0217753 -0.0060109 0.045094 -0.00016021 0.0572876 -0.00596655 0.000230749 0.00764551 0.00194617 0.0146068 0.00493631 0.0154537 0.0240981 0.00647178 0.00987463 0.00108827 1 0.0047266 -0.0061361 -0.0376677 -0.0183645 -0.0144378 0.00583894 -0.00191861 -0.0017072 -0.00297742 -5.5049e-05
num_tl_90g_dpd_24m -0.0185283 -0.0185283 -0.018561 -0.0113272 0.0198795 -0.0139407 -0.00294973 0.00193351 -0.00695552 nan -0.0118116 0.665355 -0.0329759 -0.101024 -0.101022 0.0277039 0.152675 -0.0169895 0.0159108 -0.0117064 -0.028822 -0.00928085 0.0656149 -0.0117098 0.271122 -0.00540271 -0.00378128 -0.00559278 0.0583675 0.0095132 0.0614521 -0.00208664 -0.022563 0.0163838 0.0370412 0.0332349 -0.000297686 -0.0229226 -0.0462697 0.0324189 0.00873659 -0.00263983 0.0160847 -0.0341718 0.00446792 -0.0442112 -0.0120727 0.227076 0.0505135 nan nan nan nan 0.0132967 -0.0106523 0.114447 0.00881176 0.116126 0.317129 -0.029503 -0.0145671 -0.0238053 0.0110842 0.0727621 -0.0225507 0.0238512 -0.0221875 0.0141552 0.0728438 0.0047266 1 -0.00531493 -0.266931 -0.0138684 -0.0247753 0.00218613 -0.00612248 -0.00436704 -0.00390932 0.00860913
num_tl_op_past_12m -0.012524 -0.012524 -0.0125626 0.0181777 0.214301 0.0132499 0.0352015 0.0520409 -0.0211688 nan 0.0566104 -0.0285821 -0.00379716 -0.0953209 -0.0953241 0.353596 0.0539345 0.0987125 0.391381 0.0823809 -0.0157294 -0.206963 0.354305 -0.053438 0.097447 -0.0241895 -0.0178815 -0.0170224 -0.00717039 0.720908 0.12235 0.561705 0.431285 0.0683615 0.156345 0.216616 0.837077 0.654754 -0.0968394 -0.0402328 0.228857 0.121665 0.448994 0.771942 -0.0495513 0.117169 -0.171062 -0.00218708 -0.00524767 nan nan nan nan 0.0666973 0.0373853 0.0341549 0.211114 0.0339703 0.0752611 0.197673 0.286416 0.267746 0.260238 0.194377 0.379347 0.329466 0.273814 0.389719 -0.00668362 -0.0061361 -0.00531493 1 0.0250933 -0.16699 0.0873331 0.0197826 -0.0148852 -0.00887584 -0.0251435 -0.00906105
pct_tl_nvr_dlq 0.0702641 0.0702641 0.0703008 0.0383303 -0.0472754 0.0553822 -0.0166497 0.000633011 0.00721334 nan 0.0640695 -0.440008 0.0884751 0.296871 0.296867 -0.022165 -0.608835 0.0368363 0.100041 0.0121686 0.0965034 -0.0403029 0.0289515 0.0182647 -0.559342 0.00370765 0.0012872 0.00900865 -0.0489672 0.00667259 -0.0109671 0.0264624 0.0651893 -0.0239216 0.00849149 -0.0211727 0.00695222 0.0397782 0.128772 -0.0762987 -0.017259 0.0470462 -0.0192305 0.0729495 -0.0435262 0.165385 -0.0324937 -0.0806482 -0.0171228 nan nan nan nan -0.0144858 0.0315371 -0.520245 -0.017555 -0.550831 -0.589166 0.109193 0.084511 0.13986 0.0655869 -0.00339816 0.130167 0.0533712 0.0898938 0.10137 -0.0202917 -0.0376677 -0.266931 0.0250933 1 -0.0186247 0.0854826 -0.0284664 0.0161255 0.00325997 -0.000611669 -0.0305037
percent_bc_gt_75 0.04447 0.04447 0.0444573 0.0400646 0.205924 0.0651471 0.0374231 0.000613847 -0.0597402 nan 0.133691 -0.00673088 -0.0353742 -0.399495 -0.399491 -0.0854127 0.0043547 -0.0300964 -0.0750634 -0.0295045 0.157006 0.723753 -0.0646244 -0.0279433 -0.00136059 0.0252186 0.0224391 0.0338727 -0.0262401 -0.151688 0.0425085 -0.0685178 -0.0520546 0.0133864 0.0168712 0.00629931 -0.155791 -0.153979 0.217872 0.478858 -0.0614376 -0.000464966 -0.103555 -0.154705 0.0709961 -0.411689 0.845272 -0.0101618 -0.00763145 nan nan nan nan 0.0126638 0.116542 -0.00849793 -0.0454136 -0.00396584 -0.0238768 0.0581039 0.111197 -0.133136 -0.145449 0.0173363 -0.123017 -0.112284 0.117669 -0.0758054 -0.0136826 -0.0183645 -0.0138684 -0.16699 -0.0186247 1 -0.0271813 -0.0128984 0.0131655 0.000574869 0.0229265 -0.00515201
pub_rec_bankruptcies -0.0722752 -0.0722752 -0.0723112 -0.0025567 0.0569179 -0.0633205 0.00503536 -0.0367303 0.0029982 nan -0.0127161 -0.064268 -0.0560401 -0.206976 -0.206973 0.0855617 -0.0888289 0.776127 -0.0134098 0.654163 -0.107666 -0.0690148 0.0234767 -0.0149067 -0.0395159 0.00528841 -0.0023287 0.00289668 -0.0115422 0.0558972 -0.0286686 0.0463047 0.0526413 0.0212014 -0.0283581 0.0326699 0.0796004 0.111202 -0.129389 -0.0158144 0.0731705 0.0172131 0.104239 0.114637 -0.0807372 -0.0997203 -0.018847 -0.0154123 -0.0029183 nan nan nan nan -0.0174074 0.00425672 -0.0823344 0.0656321 -0.0982129 -0.0468898 -0.056417 -0.00444145 -0.0644459 -0.00593851 -0.00935613 0.0181009 0.0486757 -0.00757742 -0.0192534 -0.00218899 -0.0144378 -0.0247753 0.0873331 0.0854826 -0.0271813 1 0.0347124 -0.00917239 -3.80428e-05 -0.00107613 -0.00949286
tax_liens 0.0132666 0.0132666 0.0132395 -0.00782063 0.0151899 0.018998 0.00531877 0.0350064 -0.0132262 nan -0.0257022 0.00593368 -0.0281559 -0.0617305 -0.0617298 0.0181919 0.0265877 0.299501 -0.0073593 0.686977 -0.010275 -0.00643908 -0.0252072 -0.00594051 0.019823 -0.00976738 -0.00788305 -0.0104434 0.00792196 0.0152902 -0.0122823 0.00654293 0.00415751 0.00247218 0.000789849 -0.00220551 0.0217304 0.0252402 -0.0217796 -0.00264618 0.0202849 -0.014358 0.0228439 0.0208618 -0.00148414 -0.0197553 -0.00370829 -0.000991514 0.0054966 nan nan nan nan -0.0178562 0.00565532 0.0113915 0.0119967 0.00937857 0.00973285 0.0133264 0.00899694 0.00332998 -0.010921 -0.0159769 0.000311432 -0.0196402 0.010145 -0.00727364 0.00807619 0.00583894 0.00218613 0.0197826 -0.0284664 -0.0128984 0.0347124 1 -0.00935712 -0.0083721 -0.00923122 -0.00434655
revol_bal_joint 0.0917166 0.0917166 0.0917409 0.0546446 0.0305633 0.0794495 -0.0640999 -0.0211773 0.153795 nan 0.157884 -0.00670418 -0.00784146 0.0357951 0.0357942 -0.0193547 -0.0151579 -0.0158114 0.0107744 -0.0161148 0.0350645 0.0198315 0.0101585 0.0176648 -0.020143 0.572076 0.634224 0.610817 -0.00329687 -0.0126559 0.00421186 -0.00521119 -0.00377276 -0.000141324 0.0222411 -0.0121975 -0.0186303 -0.020129 0.0469201 0.00676926 0.00407757 0.0245087 -0.00750618 -0.0135357 0.0469711 0.0149981 0.0106302 -0.0034651 -0.00196265 nan nan nan nan 0.0361079 -0.00607725 -0.0135996 -0.00719881 -0.0130986 -0.0138636 0.00282292 0.0039281 0.000639426 -0.00109308 0.00496934 0.00652393 0.00282971 0.00735437 0.0113808 -0.00224183 -0.00191861 -0.00612248 -0.0148852 0.0161255 0.0131655 -0.00917239 -0.00935712 1 0.629213 0.740915 0.375035
sec_app_mort_acc 0.0654644 0.0654644 0.0654809 0.0462316 0.00964099 0.0522894 -0.0570702 -0.0262688 0.135667 nan 0.12818 -0.00149503 -0.0099418 0.0349613 0.0349624 -0.0154953 -0.00237543 -0.00506234 -0.0037312 -0.0083866 0.00766717 0.00492423 0.0147019 0.0160926 -0.00844678 0.504644 0.525145 0.473685 -0.00300911 -0.00728085 -0.00247352 -0.00469154 -0.00241752 0.00206934 0.0140907 -0.0122938 -0.0179965 -0.0217354 0.0172944 -0.00215361 0.0118063 0.0345538 0.00157327 -0.00818275 0.0618051 0.00844124 -0.00364648 -0.00369874 -0.00178863 nan nan nan nan 0.0868426 -0.0114841 -0.00428139 0.00214206 -0.0033705 -0.00630334 -0.016675 -0.0146125 -0.0159482 -0.00938568 0.00691429 -0.00947076 -0.00336047 -0.0124839 -0.00345781 -0.00236635 -0.0017072 -0.00436704 -0.00887584 0.00325997 0.000574869 -3.80428e-05 -0.0083721 0.629213 1 0.598185 0.35519
sec_app_revol_util 0.0651616 0.0651616 0.065188 0.0512744 0.0466987 0.0577813 -0.0825761 -0.0391232 0.181264 nan 0.156272 -0.00247188 0.0119592 0.0193274 0.0193286 -0.021927 -0.00673003 -0.00774973 -0.0234721 -0.0103497 0.00317158 0.0295645 -0.020617 0.0132245 -0.0127033 0.674253 0.628761 0.655014 -0.00518892 -0.0204226 -0.00348556 -0.0112569 -0.0122527 -0.00470371 0.00881707 -0.0133814 -0.0269882 -0.0307799 0.00881723 0.016229 0.00628767 0.0207548 -0.00676538 -0.0270999 0.0416053 -0.0193241 0.0180859 -0.00350607 -0.00237695 nan nan nan nan 0.0186783 -0.0225751 -0.00994155 -0.00855814 -0.00786351 -0.00817286 -0.0271403 -0.0214485 -0.0355712 -0.0375617 -0.00402448 -0.0276161 -0.0300403 -0.0186286 -0.0229484 -0.00320265 -0.00297742 -0.00390932 -0.0251435 -0.000611669 0.0229265 -0.00107613 -0.00923122 0.740915 0.598185 1 0.543998
sec_app_mths_since_last_major_derog 0.025622 0.025622 0.0256408 0.0269252 0.0322549 0.0232078 -0.0539475 -0.0272948 0.120069 nan 0.0887798 0.0117696 0.0037996 0.00648239 0.00648387 -0.00925314 0.0257454 -0.0118531 -0.0122987 -0.0104793 -0.0122944 -0.00544781 -0.0080504 0.00580819 0.0311284 0.446625 0.409268 0.406984 -0.00104258 -0.00799011 -0.000702324 -0.00325605 -0.00416827 -0.00408784 0.00681597 -0.00771851 -0.0106544 -0.0121039 -0.0134969 -0.00191463 0.0133634 0.00824441 0.00452644 -0.0106693 0.0160454 -0.00872533 -0.00833916 0.0031394 -0.00176629 nan nan nan nan 0.00742479 -0.014197 0.016833 -0.000433285 0.0195779 0.0199888 -0.0203711 -0.0174936 -0.022067 -0.0194202 0.000384541 -0.0144157 -0.0132694 -0.0158036 -0.0119976 -0.000394056 -5.5049e-05 0.00860913 -0.00906105 -0.0305037 -0.00515201 -0.00949286 -0.00434655 0.375035 0.35519 0.543998 1
  • Remove the following variables, since they all only have NaN values in every column, which doesn't seem very useful for anything.
    • mo_sin_old_il_acct
    • mo_sin_old_rev_tl_op
    • mo_sin_rcnt_rev_tl_op
    • mo_sin_rcnt_tl
    • pmnt_plan
  • make_scatterplot (below) can be used to have a closer look at some of the variables that showed as highly correlated in the above analysis. Based on the correlation analysis, we therefore also decided to remove the following columns:
    • funded_amnt
    • funded_amnt_inv
    • num_sats
    • application_type
    • num_actv_rev_tl
In [7]:
def make_scatterplot(x, y):
    '''Plots a scatter plot graph
    Inputs:
        x, y: pd.Series objects representing the x and y values to plot
    '''
    fig = plt.figure(figsize=(8,5))
    plt.scatter(x, y, alpha=0.8)
    plt.title('Compare two variables', fontsize=16)
    plt.xlabel(x.name, fontsize=14)
    plt.ylabel(y.name, fontsize=14)
    plt.legend(loc='best')
    plt.show()

Now we encode the ordinal and nominal categorical variables and drop the last two xx for zip_code:

In [8]:
object_columns = []
for column in original_df:
    if original_df[column].dtype == 'object':
        object_columns.append(column)
object_columns
Out[8]:
['grade',
 'sub_grade',
 'home_ownership',
 'verification_status',
 'loan_status',
 'purpose',
 'zip_code',
 'addr_state']
In [11]:
# One function to do all of the above operations
def data_prep(df):
    '''Returns a cleaned up dataframe as basis for the EDA analysis
    Input:
        df: the pd.DataFrame object
    Returns:
        clean_df: pd.DataFrame object
    '''
    clean_df = df.copy()
    # Definitions
    cols_to_remove = ['mo_sin_old_il_acct', 'mo_sin_old_rev_tl_op', 'mo_sin_rcnt_rev_tl_op',
                      'mo_sin_rcnt_tl', 'funded_amnt', 'funded_amnt_inv', 'num_sats',
                   'application_type', 'num_actv_rev_tl', 'pymnt_plan']
    nominal_columns = ['home_ownership', 'verification_status', 'purpose', 'addr_state']
    prefixes = ['home', 'verify', 'purp', 'state']

    # Drop additional uninformative columns
    clean_df = clean_df.drop(columns=cols_to_remove)

    # strip xx from zip code
    clean_df["zip_code"] = [x.strip("xx") for x in clean_df["zip_code"].astype(str)]
    clean_df["zip_code"] = clean_df["zip_code"].astype(int)

    # exclude zip code from set of predictors for analysis
    # to be added back in later for checking discrimination...
    clean_df = clean_df.drop(columns='zip_code')


    # ordinal columns are encoded as numerical values, as there is an order
    clean_df["grade"].replace({"A": 1, "B": 2, "C": 3, "D": 4, "E": 5, "F": 6, "G": 7}, inplace = True)
    clean_df["sub_grade"].replace({"A1": 1, "A2": 2, "A3": 3, "A4": 4, "A5": 5,
                                "B1": 6, "B2": 7, "B3": 8, "B4": 9, "B5": 10,
                                "C1": 11, "C2": 12, "C3": 13, "C4": 14, "C5": 15,
                                "D1": 16, "D2": 17, "D3": 18, "D4": 19, "D5": 20,
                                "E1": 21, "E2": 22, "E3": 23, "E4": 24, "E5": 25,
                                "F1": 26, "F2": 27, "F3": 28, "F4": 29, "F5": 30,
                                "G1": 31, "G2": 32, "G3": 33, "G4": 34, "G5": 35}, inplace = True)

    # nominal columns are encoded via hot encoding by adding more columns
    clean_df = pd.get_dummies(clean_df, columns=nominal_columns, prefix=prefixes, drop_first=True)

    return clean_df
In [12]:
df_all = data_prep(original_df)
In [13]:
df_all.shape
Out[13]:
(334109, 140)
Part 2: Resample to achieve balanced classes

Since we have imbalanced classes that can cause misleading assessment of model performance, we resample the classes here. We also take the opportunity to reduce the dataset size in the initial stages, so that different model specifications can be tested faster.

In [14]:
# Check balance of target values - it is unbalanced
df_all["loan_status"].value_counts().to_frame()
Out[14]:
loan_status
Fully Paid 255116
Charged Off 78993
In [15]:
# Downsample and create balanced classes
def balance_classes(df, n_samples):
    # Define majority and minority classes, 1 = Fully paid, 0 = Charged off
    df_majority = df[df["loan_status"] == "Fully Paid"]
    df_minority = df[df["loan_status"] == "Charged Off"]

    # Downsample majority class
    df_majority_downsampled = resample(df_majority,
                                     replace = False,    # sample without replacement
                                     n_samples = n_samples,
                                     random_state = 1) # set random seed for reproducability

    # Downsample minority class
    df_minority_downsampled = resample(df_minority,
                                     replace = False,
                                     n_samples = n_samples,
                                     random_state = 1)

    # Recombine
    df_downsampled = pd.concat([df_majority_downsampled, df_minority_downsampled])

    return df_downsampled
In [16]:
# Define how many samples of each we want
n_samples = 10000

df = balance_classes(df_all, n_samples)
In [17]:
df["loan_status"].value_counts().to_frame()
Out[17]:
loan_status
Fully Paid 10000
Charged Off 10000
In [18]:
display(df.describe())
loan_amnt term int_rate installment grade sub_grade emp_length annual_inc issue_d dti delinq_2yrs earliest_cr_line fico_range_low fico_range_high inq_last_6mths mths_since_last_delinq mths_since_last_record open_acc pub_rec revol_bal revol_util total_acc initial_list_status mths_since_last_major_derog annual_inc_joint dti_joint acc_now_delinq open_acc_6m open_act_il open_il_12m open_il_24m mths_since_rcnt_il total_bal_il il_util open_rv_12m open_rv_24m max_bal_bc all_util inq_fi total_cu_tl inq_last_12m acc_open_past_24mths avg_cur_bal bc_open_to_buy bc_util chargeoff_within_12_mths delinq_amnt mort_acc mths_since_recent_bc mths_since_recent_bc_dlq mths_since_recent_inq mths_since_recent_revol_delinq num_accts_ever_120_pd num_actv_bc_tl num_bc_sats num_bc_tl num_il_tl num_op_rev_tl num_rev_accts num_rev_tl_bal_gt_0 num_tl_120dpd_2m num_tl_30dpd num_tl_90g_dpd_24m num_tl_op_past_12m pct_tl_nvr_dlq percent_bc_gt_75 pub_rec_bankruptcies tax_liens revol_bal_joint sec_app_mort_acc sec_app_revol_util sec_app_mths_since_last_major_derog home_MORTGAGE home_NONE home_OWN home_RENT verify_Source Verified verify_Verified purp_credit_card purp_debt_consolidation purp_home_improvement purp_house purp_major_purchase purp_medical purp_moving purp_other purp_renewable_energy purp_small_business purp_vacation purp_wedding state_AL state_AR state_AZ state_CA state_CO state_CT state_DC state_DE state_FL state_GA state_HI state_ID state_IL state_IN state_KS state_KY state_LA state_MA state_MD state_ME state_MI state_MN state_MO state_MS state_MT state_NC state_ND state_NE state_NH state_NJ state_NM state_NV state_NY state_OH state_OK state_OR state_PA state_RI state_SC state_SD state_TN state_TX state_UT state_VA state_VT state_WA state_WI state_WV state_WY
count 20000.000000 20000.000000 20000.000000 20000.000000 20000.000000 20000.000000 20000.000000 2.000000e+04 20000.000000 20000.000000 20000.000000 20000.000000 20000.00000 20000.000000 20000.000000 20000.000000 20000.000000 20000.000000 20000.000000 20000.000000 20000.000000 20000.000000 20000.000000 20000.000000 20000.000000 20000.000000 20000.000000 20000.000000 20000.000000 20000.000000 20000.000000 20000.000000 20000.000000 20000.000000 20000.000000 20000.000000 20000.000000 20000.000000 20000.000000 20000.000000 20000.000000 20000.000000 20000.000000 20000.000000 20000.000000 20000.000000 20000.000000 20000.000000 20000.000000 20000.00000 20000.000000 20000.000000 20000.000000 20000.000000 20000.000000 20000.000000 20000.00000 20000.000000 20000.0000 20000.000000 20000.000000 20000.000000 20000.000000 20000.000000 20000.000000 20000.000000 20000.000000 20000.000000 20000.000000 20000.000000 20000.000000 20000.000000 20000.000000 20000.000000 20000.000000 20000.00000 20000.000000 20000.000000 20000.00000 20000.000000 20000.000000 20000.000000 20000.000000 20000.000000 20000.000000 20000.00000 20000.000000 20000.000000 20000.000000 20000.0 20000.000000 20000.000000 20000.000000 20000.000000 20000.00000 20000.000000 20000.000000 20000.000000 20000.000000 20000.000000 20000.00000 20000.000000 20000.00000 20000.000000 20000.000000 20000.000000 20000.000000 20000.00000 20000.000000 20000.000000 20000.000000 20000.000000 20000.000000 20000.000000 20000.000000 20000.000000 20000.000000 20000.000000 20000.000000 20000.000000 20000.000000 20000.000000 20000.000000 20000.000000 20000.000000 20000.000000 20000.000000 20000.000000 20000.000000 20000.000000 20000.000000 20000.000000 20000.000000 20000.000000 20000.000000 20000.000000 20000.000000 20000.00000 20000.00000
mean 14845.695000 0.262550 14.527035 459.439889 3.029100 13.132300 5.504275 7.711921e+04 2016.316400 19.428785 0.354800 2000.068650 693.31600 697.316150 0.647250 0.527850 0.210200 11.869650 0.266850 15585.475900 49.242785 24.716200 0.740250 0.299300 3802.495215 0.688934 0.007000 1.108800 2.856100 0.837950 1.836950 0.973600 35986.670600 63.984300 1.492350 3.204150 5390.908650 60.317750 1.150700 1.649400 2.472950 5.317800 12935.132350 10274.992550 56.271810 0.008900 22.014000 1.479250 0.988350 0.25425 0.923100 0.360900 0.568700 3.609100 4.767800 7.636500 8.88310 8.331300 14.1224 5.547100 0.000750 0.004350 0.094450 2.472500 93.738685 41.245695 0.165000 0.065550 593.181250 0.032750 1.173845 0.008250 0.462800 0.000050 0.123400 0.41280 0.414300 0.317100 0.19315 0.580150 0.073500 0.005400 0.024150 0.014450 0.009050 0.06930 0.000850 0.012700 0.007900 0.0 0.011600 0.008550 0.025000 0.142600 0.02075 0.013850 0.002050 0.003550 0.078100 0.031900 0.00550 0.003500 0.03630 0.018050 0.007700 0.009350 0.011400 0.02325 0.022850 0.002850 0.025000 0.018150 0.014600 0.006150 0.002550 0.030300 0.002450 0.005450 0.003950 0.034750 0.006600 0.014900 0.080750 0.032050 0.009400 0.010200 0.033100 0.004300 0.010750 0.002250 0.016350 0.084300 0.007500 0.025750 0.001850 0.019450 0.013000 0.00155 0.00155
std 9175.396697 0.440031 5.564589 285.921531 1.344263 6.717764 3.848871 7.514298e+04 0.465083 11.036975 0.949717 7.652501 31.21859 31.219321 0.912228 0.499236 0.407461 5.732146 0.661712 21401.815729 24.738491 12.108133 0.438508 0.457963 21979.555941 3.892103 0.092474 1.242191 3.026838 1.020462 1.722386 0.160326 41740.899612 31.802208 1.625753 2.823521 5678.146806 20.640692 1.648886 2.858179 2.629261 3.487436 15427.656235 14876.425265 29.491357 0.105931 890.362325 1.814479 0.107307 0.43545 0.266439 0.480274 1.449547 2.318095 3.077362 4.673239 7.52271 4.768383 8.1751 3.325122 0.027377 0.069508 0.519849 2.030751 9.114103 36.310992 0.411198 0.446389 5510.168982 0.347683 9.105016 0.090456 0.498627 0.007071 0.328904 0.49235 0.492613 0.465358 0.39478 0.493547 0.260962 0.073288 0.153519 0.119339 0.094702 0.25397 0.029143 0.111979 0.088532 0.0 0.107079 0.092072 0.156129 0.349673 0.14255 0.116871 0.045232 0.059478 0.268336 0.175738 0.07396 0.059059 0.18704 0.133136 0.087413 0.096245 0.106163 0.15070 0.149429 0.053311 0.156129 0.133497 0.119948 0.078182 0.050434 0.171416 0.049438 0.073625 0.062726 0.183151 0.080974 0.121156 0.272458 0.176137 0.096499 0.100481 0.178902 0.065435 0.103126 0.047382 0.126821 0.277844 0.086279 0.158393 0.042973 0.138104 0.113277 0.03934 0.03934
min 1000.000000 0.000000 5.320000 30.650000 1.000000 1.000000 0.000000 0.000000e+00 2016.000000 0.000000 0.000000 1956.000000 660.00000 664.000000 0.000000 0.000000 0.000000 1.000000 0.000000 0.000000 0.000000 2.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.00000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.00000 0.000000 2.0000 0.000000 0.000000 0.000000 0.000000 0.000000 15.400000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.00000 0.000000 0.000000 0.00000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.00000 0.000000 0.000000 0.000000 0.0 0.000000 0.000000 0.000000 0.000000 0.00000 0.000000 0.000000 0.000000 0.000000 0.000000 0.00000 0.000000 0.00000 0.000000 0.000000 0.000000 0.000000 0.00000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.00000 0.00000
25% 8000.000000 0.000000 10.750000 252.542500 2.000000 8.000000 2.000000 4.600000e+04 2016.000000 12.520000 0.000000 1996.000000 670.00000 674.000000 0.000000 0.000000 0.000000 8.000000 0.000000 5537.750000 30.300000 16.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 1.000000 0.000000 1.000000 1.000000 10432.500000 50.000000 0.000000 1.000000 2167.750000 47.000000 0.000000 0.000000 1.000000 3.000000 3052.750000 1601.750000 32.900000 0.000000 0.000000 0.000000 1.000000 0.00000 1.000000 0.000000 0.000000 2.000000 3.000000 4.000000 4.00000 5.000000 8.0000 3.000000 0.000000 0.000000 0.000000 1.000000 90.900000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.00000 0.000000 0.000000 0.00000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.00000 0.000000 0.000000 0.000000 0.0 0.000000 0.000000 0.000000 0.000000 0.00000 0.000000 0.000000 0.000000 0.000000 0.000000 0.00000 0.000000 0.00000 0.000000 0.000000 0.000000 0.000000 0.00000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.00000 0.00000
50% 12500.000000 0.000000 13.590000 382.680000 3.000000 12.000000 5.000000 6.500000e+04 2016.000000 18.670000 0.000000 2002.000000 685.00000 689.000000 0.000000 1.000000 0.000000 11.000000 0.000000 10414.500000 49.000000 23.000000 1.000000 0.000000 0.000000 0.000000 0.000000 1.000000 2.000000 1.000000 1.000000 1.000000 24998.500000 73.000000 1.000000 3.000000 4029.500000 62.000000 1.000000 0.000000 2.000000 5.000000 6801.500000 5053.500000 58.500000 0.000000 0.000000 1.000000 1.000000 0.00000 1.000000 0.000000 0.000000 3.000000 4.000000 7.000000 7.00000 7.000000 12.0000 5.000000 0.000000 0.000000 0.000000 2.000000 97.100000 33.300000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.00000 0.000000 0.000000 0.00000 1.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.00000 0.000000 0.000000 0.000000 0.0 0.000000 0.000000 0.000000 0.000000 0.00000 0.000000 0.000000 0.000000 0.000000 0.000000 0.00000 0.000000 0.00000 0.000000 0.000000 0.000000 0.000000 0.00000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.00000 0.00000
75% 20000.000000 1.000000 17.990000 612.942500 4.000000 17.000000 10.000000 9.100000e+04 2017.000000 25.540000 0.000000 2005.000000 710.00000 714.000000 1.000000 1.000000 0.000000 15.000000 0.000000 18528.250000 67.800000 31.250000 1.000000 1.000000 0.000000 0.000000 0.000000 2.000000 3.000000 1.000000 3.000000 1.000000 46629.250000 87.000000 2.000000 4.000000 6968.000000 75.000000 2.000000 2.000000 3.000000 7.000000 17941.750000 12665.500000 82.100000 0.000000 0.000000 2.000000 1.000000 1.00000 1.000000 1.000000 1.000000 5.000000 6.000000 10.000000 12.00000 11.000000 18.0000 7.000000 0.000000 0.000000 0.000000 3.000000 100.000000 66.700000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 1.000000 0.000000 0.000000 1.00000 1.000000 1.000000 0.00000 1.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.00000 0.000000 0.000000 0.000000 0.0 0.000000 0.000000 0.000000 0.000000 0.00000 0.000000 0.000000 0.000000 0.000000 0.000000 0.00000 0.000000 0.00000 0.000000 0.000000 0.000000 0.000000 0.00000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.00000 0.00000
max 40000.000000 1.000000 30.990000 1569.110000 7.000000 35.000000 10.000000 6.693021e+06 2017.000000 490.070000 17.000000 2014.000000 845.00000 850.000000 5.000000 1.000000 1.000000 71.000000 28.000000 598769.000000 162.000000 109.000000 1.000000 1.000000 434000.000000 61.280000 4.000000 14.000000 36.000000 11.000000 20.000000 1.000000 655706.000000 268.000000 20.000000 45.000000 361299.000000 162.000000 28.000000 48.000000 29.000000 49.000000 208984.000000 231418.000000 182.300000 4.000000 65000.000000 17.000000 1.000000 1.00000 1.000000 1.000000 34.000000 23.000000 36.000000 51.000000 80.00000 61.000000 92.0000 36.000000 1.000000 2.000000 16.000000 23.000000 100.000000 100.000000 5.000000 27.000000 184999.000000 9.000000 110.700000 1.000000 1.000000 1.000000 1.000000 1.00000 1.000000 1.000000 1.00000 1.000000 1.000000 1.000000 1.000000 1.000000 1.000000 1.00000 1.000000 1.000000 1.000000 0.0 1.000000 1.000000 1.000000 1.000000 1.00000 1.000000 1.000000 1.000000 1.000000 1.000000 1.00000 1.000000 1.00000 1.000000 1.000000 1.000000 1.000000 1.00000 1.000000 1.000000 1.000000 1.000000 1.000000 1.000000 1.000000 1.000000 1.000000 1.000000 1.000000 1.000000 1.000000 1.000000 1.000000 1.000000 1.000000 1.000000 1.000000 1.000000 1.000000 1.000000 1.000000 1.000000 1.000000 1.000000 1.000000 1.000000 1.000000 1.00000 1.00000
Part 3: Scale the data and generate train/test splits

Since we will be using models that are sensitive to scale, we need to scale the data. This is done in a function for reproducability.

In [21]:
# Function to generate train test splits and scale the data
def get_train_test(df, test_size = 0.2):
    nonbinary_columns = ['loan_amnt','int_rate','installment','grade','sub_grade','emp_length',
                         'annual_inc','issue_d','dti', 'delinq_2yrs','earliest_cr_line','inq_last_6mths',
                         'open_acc','pub_rec','revol_bal','revol_util','total_acc', 'annual_inc_joint',
                         'dti_joint','acc_now_delinq','open_acc_6m','open_act_il', 'open_il_12m',
                         'open_il_24m', 'total_bal_il','il_util','open_rv_12m','open_rv_24m','max_bal_bc',
                         'all_util','inq_fi','total_cu_tl','inq_last_12m','acc_open_past_24mths',
                         'avg_cur_bal','bc_open_to_buy','bc_util','chargeoff_within_12_mths',
                         'delinq_amnt','mort_acc','num_accts_ever_120_pd', 'num_actv_bc_tl',
                         'num_bc_sats','num_bc_tl', 'num_il_tl', 'num_op_rev_tl', 'num_rev_accts',
                         'num_rev_tl_bal_gt_0', 'num_tl_120dpd_2m', 'num_tl_30dpd', 'num_tl_90g_dpd_24m',
                         'num_tl_op_past_12m', 'pct_tl_nvr_dlq', 'percent_bc_gt_75', 'pub_rec_bankruptcies',
                         'tax_liens', 'revol_bal_joint', 'sec_app_mort_acc', 'sec_app_revol_util',
                         'sec_app_mths_since_last_major_derog']
    result = df.copy()

    # Train test split
    data_train, data_test = train_test_split(result, test_size = test_size, random_state = 1)

    # Split into x and y
    X_train = data_train.iloc[:, data_train.columns != 'loan_status']
    y_train = data_train['loan_status']
    X_test = data_test.iloc[:, data_test.columns != 'loan_status']
    y_test = data_test['loan_status']

    # Incorporate scaling here so that we can use it consistently across all models
    scaler = StandardScaler()
    X_train[nonbinary_columns] = scaler.fit_transform(X_train[nonbinary_columns])
    X_test[nonbinary_columns] = scaler.transform(X_test[nonbinary_columns])

    return X_train, y_train, X_test, y_test
In [22]:
# Create scaled train and test sets
X_train, y_train, X_test, y_test = get_train_test(df)
/Users/rs/anaconda3/lib/python3.6/site-packages/sklearn/preprocessing/data.py:617: DataConversionWarning:

Data with input dtype int64, float64 were all converted to float64 by StandardScaler.

/Users/rs/anaconda3/lib/python3.6/site-packages/sklearn/base.py:462: DataConversionWarning:

Data with input dtype int64, float64 were all converted to float64 by StandardScaler.

/Users/rs/anaconda3/lib/python3.6/site-packages/ipykernel_launcher.py:30: DataConversionWarning:

Data with input dtype int64, float64 were all converted to float64 by StandardScaler.

In [23]:
# Check that data has scaled correctly
display(X_train.describe())
loan_amnt term int_rate installment grade sub_grade emp_length annual_inc issue_d dti delinq_2yrs earliest_cr_line fico_range_low fico_range_high inq_last_6mths mths_since_last_delinq mths_since_last_record open_acc pub_rec revol_bal revol_util total_acc initial_list_status mths_since_last_major_derog annual_inc_joint dti_joint acc_now_delinq open_acc_6m open_act_il open_il_12m open_il_24m mths_since_rcnt_il total_bal_il il_util open_rv_12m open_rv_24m max_bal_bc all_util inq_fi total_cu_tl inq_last_12m acc_open_past_24mths avg_cur_bal bc_open_to_buy bc_util chargeoff_within_12_mths delinq_amnt mort_acc mths_since_recent_bc mths_since_recent_bc_dlq mths_since_recent_inq mths_since_recent_revol_delinq num_accts_ever_120_pd num_actv_bc_tl num_bc_sats num_bc_tl num_il_tl num_op_rev_tl num_rev_accts num_rev_tl_bal_gt_0 num_tl_120dpd_2m num_tl_30dpd num_tl_90g_dpd_24m num_tl_op_past_12m pct_tl_nvr_dlq percent_bc_gt_75 pub_rec_bankruptcies tax_liens revol_bal_joint sec_app_mort_acc sec_app_revol_util sec_app_mths_since_last_major_derog home_MORTGAGE home_NONE home_OWN home_RENT verify_Source Verified verify_Verified purp_credit_card purp_debt_consolidation purp_home_improvement purp_house purp_major_purchase purp_medical purp_moving purp_other purp_renewable_energy purp_small_business purp_vacation purp_wedding state_AL state_AR state_AZ state_CA state_CO state_CT state_DC state_DE state_FL state_GA state_HI state_ID state_IL state_IN state_KS state_KY state_LA state_MA state_MD state_ME state_MI state_MN state_MO state_MS state_MT state_NC state_ND state_NE state_NH state_NJ state_NM state_NV state_NY state_OH state_OK state_OR state_PA state_RI state_SC state_SD state_TN state_TX state_UT state_VA state_VT state_WA state_WI state_WV state_WY
count 1.600000e+04 16000.000000 1.600000e+04 1.600000e+04 1.600000e+04 1.600000e+04 1.600000e+04 1.600000e+04 1.600000e+04 1.600000e+04 1.600000e+04 1.600000e+04 16000.000000 16000.000000 1.600000e+04 16000.000000 16000.000000 1.600000e+04 1.600000e+04 1.600000e+04 1.600000e+04 1.600000e+04 16000.000000 16000.000000 1.600000e+04 1.600000e+04 1.600000e+04 1.600000e+04 1.600000e+04 1.600000e+04 1.600000e+04 16000.000000 1.600000e+04 1.600000e+04 1.600000e+04 1.600000e+04 1.600000e+04 1.600000e+04 1.600000e+04 1.600000e+04 1.600000e+04 1.600000e+04 1.600000e+04 1.600000e+04 1.600000e+04 1.600000e+04 1.600000e+04 1.600000e+04 16000.000000 16000.000000 16000.000000 16000.000000 1.600000e+04 1.600000e+04 1.600000e+04 1.600000e+04 1.600000e+04 1.600000e+04 1.600000e+04 1.600000e+04 1.600000e+04 1.600000e+04 1.600000e+04 1.600000e+04 1.600000e+04 1.600000e+04 1.600000e+04 1.600000e+04 1.600000e+04 1.600000e+04 1.600000e+04 1.600000e+04 16000.000000 16000.000000 16000.000000 16000.000000 16000.000000 16000.000000 16000.000000 16000.000000 16000.000000 16000.000000 16000.000000 16000.000000 16000.000000 16000.000000 16000.000000 16000.000000 16000.000000 16000.0 16000.000000 16000.000000 16000.000000 16000.000000 16000.000000 16000.000000 16000.000000 16000.000000 16000.000000 16000.000000 16000.000000 16000.000000 16000.000000 16000.000000 16000.000000 16000.000000 16000.000000 16000.000000 16000.000000 16000.000000 16000.000000 16000.000000 16000.000000 16000.000000 16000.00000 16000.000000 16000.000000 16000.000000 16000.000000 16000.000000 16000.000000 16000.000000 16000.000000 16000.000000 16000.000000 16000.000000 16000.000000 16000.000000 16000.000000 16000.000000 16000.000000 16000.000000 16000.000000 16000.000000 16000.000000 16000.000000 16000.000000 16000.000000 16000.000000
mean -8.243406e-17 0.261000 -2.559480e-16 -8.903989e-17 2.926548e-16 -2.555595e-17 -4.130030e-17 4.838838e-17 -3.533851e-14 -2.584443e-16 3.158099e-16 -4.493956e-15 693.165000 697.165125 9.364731e-17 0.525750 0.210687 -2.892131e-17 -5.148312e-16 -3.172809e-17 -1.241507e-16 -2.401378e-17 0.739062 0.297938 7.833734e-16 -2.640110e-16 -9.863187e-16 5.351553e-16 -3.720635e-17 1.578598e-16 1.020677e-16 0.972625 -7.804174e-17 8.437695e-18 1.402697e-16 8.307331e-17 3.383210e-17 -6.251943e-17 -5.439260e-16 -6.276507e-16 6.811218e-17 -8.104975e-17 2.676678e-17 6.170064e-17 -9.912210e-17 4.843348e-17 3.394441e-15 7.420106e-17 0.987875 0.251688 0.922937 0.359812 -4.464901e-16 3.790787e-16 7.372228e-17 -4.350340e-16 1.106060e-17 2.710939e-16 1.001560e-16 1.829890e-16 -1.922783e-15 6.281260e-17 -2.092170e-15 3.892719e-17 -9.751215e-16 -9.632573e-17 -9.729093e-16 9.517283e-16 1.691425e-15 7.864820e-16 -1.955325e-15 6.443058e-16 0.464062 0.000063 0.121250 0.413562 0.414563 0.316000 0.195500 0.577625 0.072938 0.004938 0.023875 0.014875 0.009062 0.069125 0.000812 0.013375 0.008563 0.0 0.011875 0.008937 0.024438 0.143750 0.020438 0.014063 0.002063 0.003375 0.078375 0.032188 0.005687 0.003563 0.035937 0.018062 0.008125 0.009500 0.011750 0.023187 0.023438 0.002875 0.024750 0.018500 0.014313 0.005750 0.00275 0.030187 0.002500 0.005188 0.003812 0.034125 0.005625 0.015187 0.079250 0.033188 0.008563 0.009938 0.033188 0.004500 0.010812 0.002000 0.017063 0.085125 0.007063 0.025437 0.001813 0.019125 0.013062 0.001250 0.001500
std 1.000031e+00 0.439194 1.000031e+00 1.000031e+00 1.000031e+00 1.000031e+00 1.000031e+00 1.000031e+00 1.000031e+00 1.000031e+00 1.000031e+00 1.000031e+00 31.093148 31.093760 1.000031e+00 0.499352 0.407810 1.000031e+00 1.000031e+00 1.000031e+00 1.000031e+00 1.000031e+00 0.439160 0.457366 1.000031e+00 1.000031e+00 1.000031e+00 1.000031e+00 1.000031e+00 1.000031e+00 1.000031e+00 0.163179 1.000031e+00 1.000031e+00 1.000031e+00 1.000031e+00 1.000031e+00 1.000031e+00 1.000031e+00 1.000031e+00 1.000031e+00 1.000031e+00 1.000031e+00 1.000031e+00 1.000031e+00 1.000031e+00 1.000031e+00 1.000031e+00 0.109447 0.433996 0.266699 0.479960 1.000031e+00 1.000031e+00 1.000031e+00 1.000031e+00 1.000031e+00 1.000031e+00 1.000031e+00 1.000031e+00 1.000031e+00 1.000031e+00 1.000031e+00 1.000031e+00 1.000031e+00 1.000031e+00 1.000031e+00 1.000031e+00 1.000031e+00 1.000031e+00 1.000031e+00 1.000031e+00 0.498722 0.007906 0.326428 0.492487 0.492662 0.464927 0.396598 0.493953 0.260042 0.070096 0.152664 0.121056 0.094768 0.253675 0.028494 0.114878 0.092140 0.0 0.108327 0.094118 0.154408 0.350847 0.141496 0.117752 0.045369 0.057998 0.268769 0.176503 0.075203 0.059582 0.186140 0.133182 0.089775 0.097007 0.107762 0.150503 0.151293 0.053544 0.155367 0.134755 0.118779 0.075613 0.05237 0.171108 0.049939 0.071839 0.061630 0.181556 0.074791 0.122302 0.270137 0.179132 0.092140 0.099194 0.179132 0.066933 0.103423 0.044678 0.129508 0.279076 0.083744 0.157455 0.042536 0.136969 0.113546 0.035334 0.038702
min -1.506594e+00 0.000000 -1.660629e+00 -1.497147e+00 -1.517425e+00 -1.814905e+00 -1.432943e+00 -9.979486e-01 -6.778308e-01 -1.909580e+00 -3.763290e-01 -5.651713e+00 660.000000 664.000000 -7.091658e-01 0.000000 0.000000 -1.898702e+00 -3.977398e-01 -7.186158e-01 -1.999024e+00 -1.879410e+00 0.000000 0.000000 -1.708697e-01 -1.754900e-01 -7.584839e-02 -8.890935e-01 -9.382181e-01 -8.193343e-01 -1.056480e+00 0.000000 -8.641200e-01 -2.008902e+00 -9.205974e-01 -1.141842e+00 -1.047570e+00 -2.921467e+00 -7.002819e-01 -5.711767e-01 -9.439492e-01 -1.522478e+00 -8.340071e-01 -6.903186e-01 -1.913551e+00 -8.331548e-02 -2.507871e-02 -8.131511e-01 0.000000 0.000000 0.000000 0.000000 -3.939124e-01 -1.566318e+00 -1.561550e+00 -1.638997e+00 -1.183329e+00 -1.757034e+00 -1.489415e+00 -1.679084e+00 -2.500782e-02 -6.425621e-02 -1.868231e-01 -1.214740e+00 -8.624141e+00 -1.137415e+00 -4.018758e-01 -1.453849e-01 -1.068134e-01 -9.320399e-02 -1.259621e-01 -9.015560e-02 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.0 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.00000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000
25% -7.444671e-01 0.000000 -6.837144e-01 -7.229014e-01 -7.716013e-01 -7.704628e-01 -9.128776e-01 -4.012448e-01 -6.778308e-01 -6.748075e-01 -3.763290e-01 -5.357199e-01 670.000000 674.000000 -7.091658e-01 0.000000 0.000000 -6.755860e-01 -3.977398e-01 -4.622053e-01 -7.626065e-01 -7.198970e-01 0.000000 0.000000 -1.708697e-01 -1.754900e-01 -7.584839e-02 -8.890935e-01 -6.092063e-01 -8.193343e-01 -4.801920e-01 1.000000 -6.133488e-01 -4.374109e-01 -9.205974e-01 -7.865363e-01 -6.240137e-01 -6.468691e-01 -7.002819e-01 -5.711767e-01 -5.623436e-01 -6.644114e-01 -6.366822e-01 -5.828766e-01 -7.887079e-01 -8.331548e-02 -2.507871e-02 -8.131511e-01 1.000000 0.000000 1.000000 0.000000 -3.939124e-01 -6.987472e-01 -5.774057e-01 -7.815426e-01 -6.480487e-01 -7.034194e-01 -7.538447e-01 -7.730932e-01 -2.500782e-02 -6.425621e-02 -1.868231e-01 -7.239486e-01 -3.181981e-01 -1.137415e+00 -4.018758e-01 -1.453849e-01 -1.068134e-01 -9.320399e-02 -1.259621e-01 -9.015560e-02 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.0 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.00000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000
50% -2.640553e-01 0.000000 -1.727680e-01 -2.662447e-01 -2.577753e-02 -2.443249e-02 -1.327792e-01 -1.547802e-01 -6.778308e-01 -6.776001e-02 -3.763290e-01 2.513560e-01 685.000000 689.000000 -7.091658e-01 1.000000 0.000000 -1.513934e-01 -3.977398e-01 -2.393893e-01 -1.264820e-02 -1.401406e-01 1.000000 0.000000 -1.708697e-01 -1.754900e-01 -7.584839e-02 -9.044675e-02 -2.801946e-01 1.602180e-01 -4.801920e-01 1.000000 -2.602490e-01 2.854752e-01 -3.053276e-01 -7.592443e-02 -2.616205e-01 7.906649e-02 -9.482905e-02 -5.711767e-01 -1.807380e-01 -9.236726e-02 -3.980944e-01 -3.480124e-01 7.446455e-02 -8.331548e-02 -2.507871e-02 -2.600106e-01 1.000000 0.000000 1.000000 0.000000 -3.939124e-01 -2.649617e-01 -2.493576e-01 -1.384520e-01 -2.465885e-01 -2.819736e-01 -1.408694e-01 -1.690994e-01 -2.500782e-02 -6.425621e-02 -1.868231e-01 -2.331568e-01 3.748806e-01 -2.190845e-01 -4.018758e-01 -1.453849e-01 -1.068134e-01 -9.320399e-02 -1.259621e-01 -9.015560e-02 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 1.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.0 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.00000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000
75% 5.620352e-01 1.000000 6.188390e-01 5.424971e-01 7.200462e-01 5.723918e-01 1.167385e+00 1.695154e-01 1.475295e+00 6.091988e-01 -3.763290e-01 6.448939e-01 710.000000 714.000000 3.895708e-01 1.000000 0.000000 5.475301e-01 -3.977398e-01 1.338768e-01 7.494716e-01 5.224382e-01 1.000000 1.000000 -1.708697e-01 -1.754900e-01 -7.584839e-02 7.082000e-01 4.881711e-02 1.602180e-01 6.723841e-01 1.000000 2.593788e-01 7.254928e-01 3.099421e-01 2.793815e-01 3.093317e-01 7.082107e-01 5.106238e-01 1.278043e-01 2.008677e-01 4.796769e-01 3.257373e-01 1.628211e-01 8.730690e-01 -8.331548e-02 -2.507871e-02 2.931299e-01 1.000000 1.000000 1.000000 1.000000 2.999771e-01 6.026095e-01 4.067387e-01 5.046385e-01 4.225117e-01 5.609180e-01 4.721059e-01 4.348944e-01 -2.500782e-02 -6.425621e-02 -1.868231e-01 2.576350e-01 6.829156e-01 7.020041e-01 -4.018758e-01 -1.453849e-01 -1.068134e-01 -9.320399e-02 -1.259621e-01 -9.015560e-02 1.000000 0.000000 0.000000 1.000000 1.000000 1.000000 0.000000 1.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.0 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.00000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000
max 2.739539e+00 1.000000 2.957678e+00 3.759527e+00 2.957517e+00 3.258101e+00 1.167385e+00 8.582273e+01 1.475295e+00 2.881401e+01 1.663589e+01 1.825508e+00 845.000000 850.000000 4.784517e+00 1.000000 1.000000 1.033246e+01 4.094505e+01 2.683657e+01 4.337110e+00 5.905890e+00 1.000000 1.000000 1.960582e+01 1.087271e+01 4.250570e+01 1.029196e+01 1.090620e+01 9.955741e+00 1.046928e+01 1.000000 1.494211e+01 3.554178e+00 1.138480e+01 1.484692e+01 1.846216e+01 4.483076e+00 1.261968e+01 1.620437e+01 1.012261e+01 1.249260e+01 1.268213e+01 1.489108e+01 4.281581e+00 3.694579e+01 6.607522e+01 8.590238e+00 1.000000 1.000000 1.000000 1.000000 2.319833e+01 8.410750e+00 1.024818e+01 6.506817e+00 9.522276e+00 1.109706e+01 6.847049e+00 8.890807e+00 3.998750e+01 2.810285e+01 2.776801e+01 1.007347e+01 6.829156e-01 1.620335e+00 1.178541e+01 5.779400e+01 3.432827e+01 2.617172e+01 1.225695e+01 1.109193e+01 1.000000 1.000000 1.000000 1.000000 1.000000 1.000000 1.000000 1.000000 1.000000 1.000000 1.000000 1.000000 1.000000 1.000000 1.000000 1.000000 1.000000 0.0 1.000000 1.000000 1.000000 1.000000 1.000000 1.000000 1.000000 1.000000 1.000000 1.000000 1.000000 1.000000 1.000000 1.000000 1.000000 1.000000 1.000000 1.000000 1.000000 1.000000 1.000000 1.000000 1.000000 1.000000 1.00000 1.000000 1.000000 1.000000 1.000000 1.000000 1.000000 1.000000 1.000000 1.000000 1.000000 1.000000 1.000000 1.000000 1.000000 1.000000 1.000000 1.000000 1.000000 1.000000 1.000000 1.000000 1.000000 1.000000 1.000000
Part 4: Determine significant predictors

We can now start to explore the data and check for "significant" predictors. First we create training and test sets and then we fit some models to see what important predictors would be according to those models. We also need to encode loan_status first, so LassoCV and DecisionTree won't throw an Exception.

In [24]:
y_train = y_train.replace({'Fully Paid': 1, 'Charged Off': 0})
y_test = y_test.replace({'Fully Paid': 1, 'Charged Off': 0})
In [25]:
alphas = (.1,.5,1,5,10,50,100)
fitted_lasso = LassoCV(alphas=alphas, max_iter=100000).fit(X_train, y_train)
In [26]:
print("Relevant variables according to variable selection via Lasso:\n")

result = {}

for index, val in enumerate(fitted_lasso.coef_):
    if fitted_lasso.coef_[index] != 0:
        result[fitted_lasso.coef_[index]] = X_train.columns[index]

for key in sorted(result.keys(), reverse=True) :
    print(key , ":" , result[key])
Relevant variables according to variable selection via Lasso:

0.0025568715312380545 : fico_range_low
9.484037990805908e-08 : fico_range_high
-0.027056759401023214 : sub_grade

Run another check using a Decision Tree and Random Forest and their feature importance to check which variables are important.

In [27]:
dec_tree = DecisionTreeClassifier(max_depth = 10).fit(X_train, y_train)
dec_feat_imp = dec_tree.feature_importances_
dec_feat_imp = 100.0 * (dec_feat_imp / dec_feat_imp.max())
index_sorted = np.argsort(dec_feat_imp)
In [28]:
print("Relevant variables according to variable selection via feature importance using Decision Tree:\n")

result = {}

for index in index_sorted:
    if dec_feat_imp[index] != 0:
        result[dec_feat_imp[index]] = X_train.columns[index]

for key in sorted(result.keys(), reverse=True) :
    print(key , ":" , result[key])
Relevant variables according to variable selection via feature importance using Decision Tree:

100.0 : int_rate
21.8102858484958 : sub_grade
16.94175711634195 : dti
14.094453434442475 : avg_cur_bal
13.080815495411125 : revol_util
10.569284798288962 : max_bal_bc
9.871189426280306 : installment
9.635267369636885 : bc_util
8.257259236946746 : bc_open_to_buy
8.03682660063866 : loan_amnt
7.55227281428799 : annual_inc
7.535237938265158 : mort_acc
7.183557821318128 : total_bal_il
6.978006337551738 : num_op_rev_tl
6.681929181210858 : earliest_cr_line
6.6647724180393135 : num_il_tl
6.495857698957227 : revol_bal
5.86266611788391 : all_util
5.5760111611782746 : fico_range_low
5.373068940104285 : total_acc
4.913870239678806 : acc_open_past_24mths
4.911715130592808 : percent_bc_gt_75
4.88285808169326 : emp_length
4.869077222133283 : inq_fi
4.363778764675751 : open_il_24m
4.060260525951442 : fico_range_high
3.839255831719377 : il_util
3.778291312597342 : num_rev_accts
3.6556753140470355 : num_actv_bc_tl
3.385216405065445 : pct_tl_nvr_dlq
3.27511478961853 : total_cu_tl
2.957217239301227 : issue_d
2.8329885963891015 : num_accts_ever_120_pd
2.770657645256678 : num_bc_sats
2.4795100141578508 : num_rev_tl_bal_gt_0
2.3794481494134807 : inq_last_6mths
2.3607488322036545 : home_RENT
2.2923521705455467 : mths_since_last_major_derog
2.0924145209774383 : open_acc
1.9894908476587734 : open_act_il
1.943601352534964 : inq_last_12m
1.8967227847915737 : open_acc_6m
1.8777309406173048 : num_bc_tl
1.8479461799165948 : open_rv_12m
1.7908250334548428 : pub_rec
1.7092430983654299 : open_rv_24m
1.4033103897595263 : annual_inc_joint
1.3041945890565878 : term
1.2839171741293232 : verify_Source Verified
1.2731353625810171 : state_TX
1.1904176145772791 : home_MORTGAGE
1.115603902943565 : state_AZ
1.0751349671378039 : mths_since_recent_inq
1.035786589023599 : state_NY
0.9997873088827441 : num_tl_90g_dpd_24m
0.8570021187090296 : tax_liens
0.8300031343062144 : purp_moving
0.8178338144686026 : home_OWN
0.8177712853306784 : state_CO
0.7807049817271057 : state_MA
0.7327510518471184 : state_MN
0.6501714883144664 : sec_app_revol_util
0.6501304648949682 : state_IN
0.6199408293825635 : state_NC
0.5395886731087545 : state_GA
0.5092111857063009 : state_WA
0.4917577268287283 : delinq_2yrs
0.48911373766953464 : purp_debt_consolidation
0.4833483974121898 : state_SC
0.46941579383371534 : state_OH
0.46370556784124767 : purp_home_improvement
0.42247227390784226 : purp_other
0.41683887274399584 : purp_small_business
0.39731170292844176 : state_LA
0.39499320774409735 : state_MS
0.3825197380258621 : state_FL
0.3735544316658818 : sec_app_mort_acc
0.37096871705630163 : state_AL
0.3423415242799207 : num_tl_op_past_12m
0.28259306683074253 : pub_rec_bankruptcies
0.2661575325619406 : initial_list_status
0.25515226574095784 : state_NH
0.23991418079163468 : dti_joint
0.23484488167230042 : state_MI
0.23406261540647408 : state_IL
0.2150767939894469 : state_KY
0.18926757871071329 : verify_Verified
0.11688497242197435 : state_NV
0.059107297641197536 : state_PA
In [29]:
# the accuracy score for Decision Tree
accuracy_score(dec_tree.predict(X_test), y_test)
Out[29]:
0.60325
In [30]:
# Check by running a basic random forest (without tuning parameters)
rf = RandomForestClassifier(n_estimators=int(X_train.shape[1]/2),
                            max_depth = 2).fit(X_train, y_train)
print("The precision score for the basic RF is:")
precision_score(rf.predict(X_test), y_test, pos_label = 1)
The precision score for the basic RF is:
Out[30]:
0.5877750611246944
In [31]:
# Check that the top features make sense
rf_feat_imp = rf.feature_importances_
rf_feat_imp = 100.0 * (rf_feat_imp / rf_feat_imp.max())
sorted_idx = np.argsort(rf_feat_imp)
pos = np.arange(sorted_idx.shape[0]) + .5

#Plot
plt.figure(figsize=(10,24))
plt.barh(pos, rf_feat_imp[sorted_idx], align='center')
plt.yticks(pos, X_train.columns[sorted_idx])
plt.xlabel('Relative Importance')
plt.title('Variable Importance');

Summary:

From what the Lasso and the decision tree evaluation returned, below are the most interesting/significant variables. Those are the top 41, ranked by whether they appeared as important in Lasso, Decision tree and Random Forest. Since sub_grade is a more detailed view of grade, we've only included sub_grade.

  • int_rate
    (Interest Rate on the loan)
  • sub_grade
    (LC assigned loan subgrade) - we'll skip grade, since sub_grade correlates with grade and is more granular
  • avg_cur_bal
    (Average current balance of all accounts)
  • dti
    (A ratio calculated using the borrower’s total monthly debt payments on the total debt obligations, excluding mortgage and the requested LC loan, divided by the borrower’s self-reported monthly income)
  • bc_open_to_buy
    (Total open to buy on revolving bankcards)
  • fico_range_low
    (The lower boundary range the borrower’s FICO at loan origination belongs to), we will skip fico_range_high as it correlates with the low range and doesn't provide additional information.
  • installment
    (The monthly payment owed by the borrower if the loan originates)
  • emp_length
    (Employment length in years. Possible values are between 0 and 10 where 0 means less than one year and 10 means ten or more years)
  • term
    (The number of payments on the loan. Values are in months and can be either 36 or 60)
  • percent_bc_gt_75
    (Percentage of all bankcard accounts > 75% of limit)
  • verification_status
    (Indicates if income was verified by LC, not verified, or if the income source was verified)
  • revol_util
    (Revolving line utilization rate, or the amount of credit the borrower is using relative to all available revolving credit)
  • all_util
    (Balance to credit limit on all trades)
  • home_ownership
    (The home ownership status provided by the borrower during registration or obtained from the credit report. Our values are: RENT, OWN, MORTGAGE, OTHER)
  • open_rv_24m
    (Number of revolving trades opened in past 24 months)
  • num_rev_tl_bal_gt_0
    (Number of revolving trades with balance >0)
  • loan_amnt
    (The listed amount of the loan applied for by the borrower. If at some point in time, the credit department reduces the loan amount, then it will be reflected in this value)
  • annual_inc
    (The self-reported annual income provided by the borrower during registration)
  • mort_acc
    (Number of mortgage accounts)
  • max_bal_bc
    (Maximum current balance owed on all revolving accounts)
  • total_bal_il
    (Total current balance of all installment accounts)
  • pct_tl_nvr_dlq
    (Percent of trades never delinquent)
  • earliest_cr_line
    (The month the borrower's earliest reported credit line was opened)
  • bc_util
    (Ratio of total current balance to high credit/credit limit for all bankcard accounts)
  • revol_bal
    (Total credit revolving balance)
  • total_acc
    (The total number of credit lines currently in the borrower's credit file)
  • num_il_tl
    (Number of installment accounts)
  • acc_open_past_24mths
    (Number of trades opened in past 24 months)
  • il_util
    (Ratio of total current balance to high credit/credit limit on all install acct)
  • delinq_amnt
    (The past-due amount owed for the accounts on which the borrower is now delinquent)
  • num_actv_bc_tl
    (Number of currently active bankcard accounts)
  • num_bc_sats
    (Number of satisfactory bankcard accounts)
  • open_rv_12m
    (Number of revolving trades opened in past 12 months)
  • num_rev_accts
    (Number of revolving accounts)
  • open_acc
    (The number of open credit lines in the borrower's credit file)
  • open_acc_6m
    (Number of open trades in last 6 months)
  • open_act_il
    (Number of currently active installment trades)
  • num_op_rev_tl
    (Number of open revolving accounts)
  • total_cu_tl
    (Number of finance trades)
  • inq_fi
    (Number of personal finance inquiries)
  • inq_last_6mths
    (The number of inquiries in past 6 months (excluding auto and mortgage inquiries))
Part 5: Exploring the data
In [32]:
# create dataframe with relevant predictors plus the response variable
eda_predictors = ['int_rate', 'sub_grade', 'avg_cur_bal', 'dti', 'bc_open_to_buy', 'fico_range_low', 'installment',
                  'emp_length', 'term', 'percent_bc_gt_75', 'verify_Verified', 'revol_util', 'all_util',
                  'home_MORTGAGE','home_RENT', 'open_rv_24m', 'num_rev_tl_bal_gt_0',
                  'loan_amnt', 'annual_inc', 'mort_acc', 'max_bal_bc', 'total_bal_il', 'pct_tl_nvr_dlq',
                  'earliest_cr_line', 'bc_util', 'revol_bal', 'total_acc', 'num_il_tl', 'acc_open_past_24mths',
                  'il_util', 'delinq_amnt', 'num_actv_bc_tl', 'num_bc_sats', 'open_rv_12m', 'num_rev_accts',
                  'open_acc','open_acc_6m', 'open_act_il', 'num_op_rev_tl',
                  'total_cu_tl', 'inq_fi', 'inq_last_6mths', 'loan_status']

# this reduced dataframe is created to be able to run a pairplot analysis
df_eda = df[eda_predictors]
In [33]:
df_eda.shape
Out[33]:
(20000, 43)
In [34]:
sns.pairplot(df_eda.sample(1000), hue='loan_status');

Some general functions to use for displaying plots

In [35]:
def get_barplot(x,y):
    '''Function to display regular bar plots
    Inputs:
        x: pd.Series representing the x values
        y: pd.Series representing the y values
    '''
    title = y.name + ' over ' + x.name
    plt.figure(figsize=(14,8))
    plt.bar(x, y, align='center', alpha=0.6)
    plt.title(title, fontsize=16)
    plt.xlabel(x.name, fontsize=14)
    plt.ylabel(y.name, fontsize=14)
    plt.grid()
    plt.show()
In [36]:
def get_hbarplot(x,y):
    '''Function to display horizontal bar plots
    Inputs:
        x: pd.Series representing the x values
        y: pd.Series representing the y values
    '''
    title = y.name + ' over ' + x.name
    plt.figure(figsize=(10,14))
    plt.barh(x, y, align='center', alpha=0.6)
    plt.title(title, fontsize=16)
    plt.xlabel(y.name, fontsize=14)
    plt.ylabel(x.name, fontsize=14)
    plt.grid()
    plt.show()
In [37]:
def get_histogram(y,b):
    '''Function to display a histogram
    Inputs:
        y: pd.Series representing the values to display
        b: the number of bins to show
    '''
    title = 'Distribution of ' + y.name
    plt.figure(figsize=(14,8))
    plt.hist(y, bins=b)
    plt.title(title, fontsize=16)
    plt.xlabel(y.name, fontsize=14)
    plt.ylabel("Numbers of {}".format(y.name), fontsize=14)
    plt.grid()
    plt.show()
In [38]:
def get_violinplot(x, y, data):
    '''Function to display violin plots
    Inputs:
        x: a string used with data to represent the x axis values
        y: a string used to represent the y axis values
        data: the dataframe
    '''
    title = y + ' over ' + x
    x_category = np.sort(data[x].unique())

    fig, ax1 = plt.subplots(figsize=(15, 8), sharey=True)

    for idx, val in enumerate(x_category):
        parts = ax1.violinplot(data.loc[data[x]==val, y], positions=[idx],
                               showmeans=True, showmedians=True, showextrema=True)
        for pc in parts['bodies']:
            pc.set_facecolor('#D43F3A')
            pc.set_edgecolor('black')
            pc.set_alpha(.6)

        for partname in ('cbars','cmins','cmaxes', 'cmedians', 'cmeans'):
            vp = parts[partname]
            vp.set_edgecolor('black')
            vp.set_alpha(0.6)

    ax1.set_title(title, fontsize=16)
    ax1.set_xlabel(x, fontsize=14)
    ax1.set_ylabel(y, fontsize=14)
    ax1.set_xticks(np.arange(x_category.size))
    ax1.set_xticklabels(x_category)
    plt.show()

Analyze the distribution of the loan amounts

In [39]:
get_histogram(original_df['loan_amnt'], 40)
In [40]:
original_df['loan_amnt'].describe()
Out[40]:
count    334109.000000
mean      14477.898306
std        9132.981886
min        1000.000000
25%        7200.000000
50%       12000.000000
75%       20000.000000
max       40000.000000
Name: loan_amnt, dtype: float64

We can see that the round \$5000 increment loan amounts <b>(e.g. 10k, 15k, 20k, etc) are the most popular</b>. Most loans are around $10000 and gradually decrease from there up to \$35k where they increase again slightly.<br> <b>50% of all loans are below \\$12,000 and 75% below \$20,000</b>.

Analyze the loan amount by sub grade

In [41]:
get_violinplot('sub_grade', 'loan_amnt', original_df)

We can see a trend that looks like smaller loans are usually taken borrower with higher ratings (grade) and higher loans are usually taken by borrowers with lower credit ratings.

Analyze loans issued and interest rates by states and sub grades

We can also show the number of loans, the average income and interest rate by state to see where most loans were made and under which conditions.

In [42]:
by_loan_amount = original_df.groupby(['addr_state'], as_index=False).loan_amnt.sum()
by_interest_rate = original_df.groupby(['addr_state'], as_index=False).int_rate.mean()
by_income = original_df.groupby(['addr_state'], as_index=False).annual_inc.mean()

states = by_loan_amount['addr_state'].values.tolist()
average_loan_amounts = by_loan_amount['loan_amnt'].values.tolist()
average_interest_rates = by_interest_rate['int_rate'].values.tolist()
average_annual_income = by_income['annual_inc'].values.tolist()

from collections import OrderedDict

metrics_data = OrderedDict([('state_codes', states),
                            ('issued_loans', average_loan_amounts),
                            ('interest_rate', average_interest_rates),
                            ('annual_income', average_annual_income)])

metrics_df = pd.DataFrame.from_dict(metrics_data)
metrics_df = metrics_df.round(decimals=2)
metrics_df.head()
Out[42]:
state_codes issued_loans interest_rate annual_income
0 AK 12491300.0 13.53 80730.84
1 AL 55503475.0 14.13 71242.99
2 AR 33473650.0 13.92 69699.80
3 AZ 122958650.0 13.46 75824.19
4 CA 725545250.0 13.54 87677.63
In [43]:
data = metrics_df.sort_values(['issued_loans'])
get_hbarplot(data['state_codes'], data['issued_loans'])
In [44]:
data = metrics_df.sort_values(['annual_income'])
get_hbarplot(data['state_codes'], data['annual_income'])
In [45]:
high_loan_states = ['CA', 'TX', 'NY', 'FL']
highloans = metrics_df[metrics_df['state_codes'].isin(high_loan_states)]
display(highloans)

highloans_perc = (highloans['issued_loans'].sum() / metrics_df['issued_loans'].sum()) * 100
print("Percentage of loans made in states CA, TX, NY, FL relative to all loans: {:.2f}".format(highloans_perc))
state_codes issued_loans interest_rate annual_income
4 CA 725545250.0 13.54 87677.63
9 FL 342036825.0 13.82 74264.63
33 NY 377734975.0 13.89 82417.00
42 TX 427782800.0 13.57 83526.73
Percentage of loans made in states CA, TX, NY, FL relative to all loans: 38.72

When looking at the data, we can see that the 4 states of CA, TX, NY and FL have issued almost 40% of all loans in 2016 and 2017.

When looking at the average annual incomes, 3 states (CA, NY and TX) are somewhere in the top 10 of states but FL is on rank. This leaves the question as to why there have been so many loans in FL?

It looks like the average interest rate is fairly evenly distributed across the states, roughly between 12,5 to 14,5% (see below).

In [46]:
data = metrics_df.sort_values(['interest_rate'])
get_hbarplot(data['state_codes'], data['interest_rate'])

Interest Rate by Sub Grade

In [47]:
group1 = original_df.groupby(['sub_grade'], as_index=False).int_rate.mean()
group1.head()
Out[47]:
sub_grade int_rate
0 A1 5.320000
1 A2 6.769053
2 A3 7.152234
3 A4 7.483164
4 A5 8.034195
In [48]:
get_barplot(group1['sub_grade'], group1['int_rate'])

We see what we would've expected, that interest rates go up, as the loan sub grades decrease.

Analyze loan default rates

defaulted loans based on interest rates

In [49]:
original_df['int_round'] = original_df['int_rate'].round(0).astype(int)
df_y = original_df[['int_round','loan_status']]
crosst_y = pd.crosstab(original_df['int_round'], original_df['loan_status'])
crosst_y['Fully Paid %'] = crosst_y['Fully Paid'] / (crosst_y['Fully Paid'] + crosst_y['Charged Off'])*100
crosst_y
Out[49]:
loan_status Charged Off Fully Paid Fully Paid %
int_round
5 504 11810 95.907098
6 276 3428 92.548596
7 1538 18841 92.453015
8 2702 20635 88.421819
9 2412 15833 86.779940
10 3049 18015 85.525066
11 8184 34952 81.027448
12 2559 9219 78.273051
13 7896 25552 76.393207
14 8627 23980 73.542491
15 4140 10870 72.418388
16 5474 13444 71.064595
17 4029 7806 65.956907
18 3900 7656 66.251298
19 3805 6705 63.796384
20 3760 5870 60.955348
21 2979 4562 60.495955
22 1714 1976 53.550136
23 1404 1995 58.693733
24 1891 2550 57.419500
25 2242 2901 56.406766
26 1882 2428 56.334107
27 763 561 42.371601
28 463 355 43.398533
29 894 954 51.623377
30 677 785 53.693570
31 1229 1433 53.831705
In [50]:
crosst_y[['Charged Off','Fully Paid']].plot(kind='bar', stacked=True, figsize=(12,8))
ax2 = plt.twinx()
sns.lineplot(data=crosst_y['Fully Paid %'], color="b", ax=ax2)
plt.title('Fully Paid and Charged Off loans by interest rate')
plt.show()

It seems like the interest rate is a reliable predictor of which loans might default in the future. Higher interest rate loans are more likely to default.

defaulted loans based on loan purpose and loan grades

In [51]:
cm = sns.light_palette("red", as_cmap=True)
crosst = pd.crosstab(original_df['purpose'], original_df['loan_status'])
crosst['Charged Off %'] = crosst['Charged Off'] / (crosst['Fully Paid'] + crosst['Charged Off'])*100
crosst.style.background_gradient(cmap = cm)
Out[51]:
loan_status Charged Off Fully Paid Charged Off %
purpose
car 662 3228 17.018
credit_card 13753 50928 21.2628
debt_consolidation 47602 144826 24.7376
home_improvement 5138 20460 20.0719
house 470 1575 22.9829
major_purchase 1856 6441 22.3695
medical 1211 3419 26.1555
moving 766 2036 27.3376
other 5509 17519 23.9231
renewable_energy 73 184 28.4047
small_business 1331 2251 37.158
vacation 622 2248 21.6725
wedding 0 1 0

From the original data set we know our "base charged off rate", meaning the % of all charged off loans over all loans made in 2016-2017 is 23.64%. This means all loan purposes that show a higher percentage, are at a higher risk than average to default. In this case that is dept_collection, medical, moving, renewable_energy, small_business and other.

In [52]:
loan_grade = ['loan_status', 'grade']
cm = sns.light_palette("red", as_cmap=True)
cross_u = pd.crosstab(df[loan_grade[0]], original_df[loan_grade[1]], normalize='columns').style.background_gradient(cmap = cm)
cross_u
Out[52]:
grade A B C D E F G
loan_status
Charged Off 0.207161 0.380218 0.528376 0.633006 0.702936 0.793609 0.774648
Fully Paid 0.792839 0.619782 0.471624 0.366994 0.297064 0.206391 0.225352

We see above that loans with grades of D and lower are more likely to lead to a default in the future than higher rated loans with for instance grades A or B.

defaulted loans based on state

In [53]:
fig = plt.figure(figsize=(18,10))
original_df[original_df['loan_status']=="Charged Off"].groupby('addr_state')['loan_status'].count().sort_values().plot(kind='barh')
plt.ylabel('State',fontsize=15)
plt.xlabel('Number of loans',fontsize=15)
plt.title('Number of defaulted loans per state',fontsize=20);

The states of CA, NY, TX and FL have the highest default rates but that is simply because they have the highest number of loans in general. If we wanted to get a more accurate default rate by state, we may have to calculate the defaulted loans relative to the number of loans issued in that state and then compare that to the national default average of 23.64%. The table below shows the states with a higher risk of default compared to the national average.

In [54]:
average = 23.64

crosst = pd.crosstab(original_df['addr_state'], original_df['loan_status'])
crosst['Charged Off %'] = crosst['Charged Off'] / (crosst['Fully Paid'] + crosst['Charged Off'])*100
crosst.sort_values('Charged Off %', ascending=False)[crosst['Charged Off %'] > average]
/Users/rs/anaconda3/lib/python3.6/site-packages/ipykernel_launcher.py:5: UserWarning:

Boolean Series key will be reindexed to match DataFrame index.

Out[54]:
loan_status Charged Off Fully Paid Charged Off %
addr_state
AR 734 1802 28.943218
LA 1063 2672 28.460509
MS 572 1451 28.274839
OK 861 2192 28.201769
AL 1139 2971 27.712895
NY 7005 18971 26.967200
NJ 2894 8350 25.738171
NE 430 1244 25.686977
MD 1938 5618 25.648491
AK 188 553 25.371120
FL 6171 18528 24.984817
MO 1269 3872 24.683914
KY 818 2496 24.683162
OH 2659 8146 24.608977
NV 1341 4109 24.605505
IN 1447 4446 24.554556
PA 2636 8113 24.523211
NM 441 1386 24.137931
TX 6767 21442 23.988798
CA 11416 36307 23.921380
MN 1412 4497 23.895752
MI 2128 6786 23.872560
VA 2108 6796 23.674753
In [ ]: